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The  simulation  results  that  are  dlsoussed  in  this  report  represent  the 
findings  obtained  during  the  period  of  the  grant,  1  July  1980  -  30  June 
1981.  Further  simulation  studies  have  been  conducted  and  will  be 
documented  in  a  Ph.D.  thesis  by  Timothy  G.  Saponas  entitled  "Distributed 
and  Decentralized  Control  in  a  Fully  Distributed  Processing  System," 
which  is  to  be  published  in  the  near  future. 
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SECTION  1 
INTRODUCTION 

Distributed  Processing  Systems  are  currently  receiving  a  very  large 
amount  of  attention.  This  is  due  in  part  to  the  claims  that  these  systems 
will  provide  a  number  of  advantages  over  contemporary  systems  (see  Table  1). 
Some  of  the  more  important  potential  advantages  being  publicized  are  the  fol¬ 
lowing:  increased  performance  (with  respect  to  both  throughput  and  response 
time),  ability  to  share  resources,  ease  of  system  expansion,  and  the  ability 
to  provide  fault- tolerance. 


Table  1.  "Benefits"  Provided  by  Distributed  Processing  Systems 

A  Representative  List  Assembled  from  Claims  Made  in 
Actual  Sales  Literature 

High  Availability  and  Reliability 
Reduced  Network  Costs 
High  System  Performance 
Fast  Response  Time 
High  Throughput 

Graceful  Degradation,  Fail-soft 
Ease  of  Modular  and  Incremental  Growth 
Configuration  Flexibility 
Automatic  Load  and  Resource  Sharing 
Easily  Adaptable  to  Changes  in  Workload 
Incremental  Replacement  and/or  Upgrade 
Easy  Expansion  in  Capacity  and/or  Function 
Good  Response  to  Temporary  Overloads 


This  report  is  concerned  with  a  particular  class  of  distributed  proces- 
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sing  systems,  "Fully  Distributed  Processing  Systems  (FDPS),"  which  are  the 
focus  of  a  major  research  program  at  the  Georgia  Institute  of  Technology.  For 
a  system  to  be  classified  as  an  "FDPS,"  it  must  possess  all  five  of  the  fol¬ 
lowing  characteristics: 

1 .  Multiplicity  of  resources:  an  FDPS  is  composed  of  a  mul¬ 
tiplicity  of  "general-purpose"  resources  that  cam  be  freely 
assigned  on  a  short-term  basis  to  various  system  tasks  as 
required  (e.g.,  hardware  and  software  processors,  shared  data 
bases,  etc.). 

2.  Component  interconnection :  the  active  components  in  the  FDPS 
are  physically  connected  by  a  communication  network(s)  utiliz¬ 
ing  two-party,  cooperative  protocols  to  control  the  physical 
transfer  of  data  (i.e.,  loose  physical  and  logical  coupling). 

3.  Unity  of  control:  the  executive  control  of  an  FDPS  must  define 
and  support  a  unified  set  of  policies  governing  the  operation 
and  utilization  of  all  physical  and  logical  resources. 

4 .  System  transparency:  users  must  be  able  to  request  services  by 
generic  names  without  being  aware  of  their  physical  location  or 
even  the  fact  that  multiple  copies  of  the  resources  may  exist. 
(System  transparency  is  designed  to  aid  rather  than  inhibit 
and,  therefore,  can  be  overridden.  A  user  who  is  concerned 
about  the  performance  of  a  particular  application  cm  provide 
system-specific  information  to  aid  in  the  formulation  of 
management  control  decisions.) 

5.  Component  autonomy;  both  the  logical  and  physical  components 

of  an  FDPS  should  interact  in  a  manner  described  as 

"cooperative  autonomy"  [Ensl78].  This  means  that  the  com¬ 
ponents  operate  in  an  autonomous  fashion  requiring  cooperation 
among  processes  for  the  exchange  of  information  as  well  as  for 
the  provision  of  services.  In  a  cooperatively  autonomous 
control  environment,  the  components  are  afforded  the  ability  to 
refuse  requests  for  service,  regardless  of  whether  the  service 
request  involves  execution  of  a  process  or  the  use  of  a  file. 

This  could  result  in  anarchy  except  for  the  fact  that  all  com¬ 
ponents  adhere  to  a  common  set  of  system  utilization  and 
management  policies  expressed  by  the  philosophy  of  the 
executive  control. 

A  more  detailed  explanation  of  these  characteristics  is  found  in  Section  2  of 
this  report. 

An  essential  component  of  an  FDPS  is  the  distributed  and  decentralized 
control.  This  component  unifies  the  management  of  the  resources  of  the  FDPS 
and  provides  system  transparency  to  the  user.  A  previous  study  (see  [EnslSl]) 
examined  the  characteristics  of  various  models  of  distributed  and 
decentralized  control  that  met  this  criteria  and  identified  a  number  of 
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variations  possible  in  specific  features  of  the  different  models.  That 
research  helped  to  define  more  clearly  the  exact  nature  of  the  operation  of  an 
FDPS,  the  problems  inherent  in  distributed  and  decentralized  control,  and  pos¬ 
sible  solutions  to  these  problems. 

The  scope  and  goal  of  the  present  work  is  to  both  qualitatively  and 
quantitatively  evaluate  the  effect  of  these  features  on  the  performance  of  the 
various  models  of  control.  The  qualitative  evaluation  is  intended  to 
demonstrate  how  a  particular  model  performs  in  a  specific  environment.  In 
this  phase,  the  validity  of  a  model  is  established.  The  quantitative 
evaluation,  on  the  other  hand,  is  intended  to  examine  in  general  the  relative 
merits  of  decentralized  control  and  provide  data  to  support  conclusions  about 
the  relative  performance  of  the  various  models. 


i 

i 
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SECTION  2 
BACKGROUND 


2.1  XHE.  DEFINITION  ££  M  FDPS 

Fully  Distributed  Processing  Systems  (FDPS)  were  first  defined  by  Enslow 
in  1976  [Ensl78]  although  the  designation  "fully"  was  not  added  until  1978 
when  it  became  necessary  to  clearly  distinguish  this  specific  class  of  systems 
from  the  many  others  being  presented  as  "distributed  processing  systems."  As 
discussed  in  Section  1,  an  FDPS  is  distinguished  by  the  following  charac¬ 
teristics: 


1.  Multiplicity  of  resources. 

2.  Component  interconnection. 

3.  Unity  of  control. 

4.  System  transparency. 

5.  Component  autonomy. 

It  is  important  to  note  that  in  order  for  a  system  to  qualify  as  being 
fully  distributed  it  must  possess  all  five  of  the  criteria  presented  in  this 
definition. 


2.1.1  Multiple  Resources  and  Their  Utilization 

The  requirement  for  resource  multiplicity  concerns  the  assignable 
resources  that  a  system  provides.  Therefore,  the  type  of  resources  requiring 
replication  depends  on  the  purpose  of  a  system.  For  example,  a  distributed 
system  designed  to  perform  real-time  computing  for  air  traffic  control 
requires  a  multiplicity  of  special-purpose  air  traffic  control  processors  and 
display  terminals.  It  is  not  required  that  replicated  resources  be  exactly 
homogeneous;  instead,  they  must  be  capable  of  providing  the  same  services. 


In  addition  to  the  requirement  for  multiplicity,  the  system  resources 
must  be  dynamically  reconfigurable  to  respond  to  component  failures  as  well  as 
changes  in  the  work  load  presented  to  the  system.  This  reconfiguration  must 
occur  within  a  "short"  period  of  time  so  as  to  maintain  the  functional 
capabilities  of  the  overall  system  without  affecting  the  operation  of  com¬ 
ponents  not  directly  involved.  Under  normal  operation,  the  system  must  be 
able  to  dynamically  assign  its  tasks  to  components  distributed  throughout  the 
system.  _ _ 
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The  extent  to  which  resources  are  replicated  can  range  from  those 
systems  where  none  are  replicated  (not  a  fully  distributed  system)  to  systems 
with  all  assignable  resources  replicated.  In  addition,  the  number  of  copies 
of  a  particular  resource  can  vary  depending  on  the  system  and  type  of 
resource.  In  general,  the  greater  the  degree  of  replication,  particularly  of 
resources  in  high  demand,  the  greater  the  potential  for  attaining  benefits 
such  as  increased  performance  (response  time  and  throughput),  availability, 
reliability,  and  flexibility  [Ensl78]. 

2.1.2  ,CQflUM?a9ftt.  Interconnection  jud.  Communication 

The  extent  of  physical  distribution  of  resources  in  distributed  systems 
can  range  from  the  length  of  a  connection  between  components  on  a  single 
integrated  chip  to  the  distance  between  two  computers  communicating  through  an 
international  network.  In  addition,  interconnection  subsystem  organizations 
can  vary  from  a  single  time-shared  bus  to  a  complex,  mesh  interconnection 
network.  Since  a  component  in  a  distributed  system  communicates  with  other 
components  through  its  own  logical  process,  all  physical  and  logical  resources 
can  be  thought  of  as  processes,  and  interactions  between  resources  can  be 
referred  to  as  interprocess  communication  [Davl79].  For  example,  application 
program  interaction  with  data  files  is  accomplished  through  communication 
between  logical  processes,  the  application  process  and  the  file  process. 

In  an  FDPS,  both  the  physical  and  logical  coupling  of  the  system  com¬ 
ponents  are  characterized  as  "extremely  loose."  "Gated"  or  "master-slave" 
control  of  physical  transfers  is  not  allowed.  Communication  (i.e.,  the 
physical  transfer  of  messages)  is  accomplished  through  the  active  cooperation 
and  participation  of  both  the  sender  and  addressees.  The  primary  requirement 
of  the  interconnection  subsystem  is  that  it  support  such  a  two-party 
cooperative  protocol.  This  is  essential  to  enable  the  system's  resources  to 
exist  with  "cooperative  autonomy"  at  the  physical  level. 

The  advantages  of  using  a  message-based  (loosely-coupled)  communication 
system  with  a  two-party  cooperative  protocol  include  reliability, 
availability,  and  extensibility.  The  disadvantage  is  the  additional  overhead 
of  message  processing  incurred  to  support  this  method  of  communication.  There 
are  a  variety  of  interconnection  organizations  and  communication  techniques 
that  can  be  used  to  support  a  message-based  system  with  a  two-party 
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cooperative  protocol. 

2.1.3  Unity  of  Control 

In  a  fully  distributed  data  processing  system,  individual  processors 
will  control  local  resources  with  their  own  local  operating  systems,  which  may 
or  may  not  be  unique.  As  a  result,  control  is  distributed  throughout  the 
system  to  control  system  components  that  operate  autonomously.  However,  to 
gain  the  benefits  of  distributed  processing,  it  is  required  that  the 
autonomous  components  of  the  system  cooperate  with  each  other  to  achieve  the 
overall  objectives  of  the  system.  To  Insure  this,  the  concept  of  a  high-level 
operating  system  was  created  to  integrate  and  unify,  at  least  conceptually, 
the  decentralized  control  of  the  system. 

A  high-level  operating  system  is  essential  to  the  successful  implementa¬ 
tion  of  a  distributed  processing  system.  The  high-level  operating  system  is 
not  a  centralized  block  of  code  exercising  strong  hierarchical  control  over 
the  system;  instead,  it  is  a  well-defined  set  of  policies  governing  the 
integrated  operation  of  the  system  as  a  whole.  To  insure  reliable  and 
flexible  operation  of  the  system,  these  policies  should  be  implemented  with 
minimal  binding  to  any  of  the  system's  components  [Ensl78]. 

What  policies  are  required  and  how  they  should  be  implemented  depends 
greatly  on  the  system.  For  example,  if  it  is  a  general-purpose  system  sup¬ 
porting  interactive  users,  then  a  command  interpreter  and  a  user  control 
language  is  required  to  make  the  system's  components  compatible  and 
transparent  to  the  user. 

2.1.4  Transparency  &£  System  Control 

The  high-level  operating  system  also  provides  the  user  with  an  interface 
to  the  distributed  system.  As  a  result,  the  user  is  accessing  the  system  as  a 
whole  rather  than  just  a  single  computer  in  the  network. 

In  order  to  increase  the  effectiveness  of  the  distributed  system,  the 
actual  system  organization  is  made  transparent.  The  user  is  presented  with  a 
virtual  machine  and  a  command  language  to  access  it.  Using  this  command 
language,  the  user  requests  services  by  name  and  does  not  need  to  specify  the 
specific  server  to  be  used.  Clearly,  multiple  requests  for  the  same  service 
might  be  assigned  to  different  servers  depending  on  the  state  of  the  total 
system  when  the  request  Is  made.  However,  to  make  the  system  truly  effective 
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for  all  users,  knowledgeable  Individuals  must  be  able  to  Interact  with  the 
system  more  directly,  requesting  specific  servers  or  developing  service 
routines  to  increase  the  efficiency  or  effectiveness  of  the  system  [Ensl78]. 

2.1.5  Cooperative  Aufcanaai 

Cooperative  autonomy  has  already  been  described  at  the  physical  inter¬ 
connection  level.  It  is  also  required  that  all  resources  be  autonomous  at  the 
logical  control  level.  A  resource  must  have  complete  control  in  determining 
which  requests  it  will  service  and  what  future  operations  it  will  perform. 
However,  a  resource  must  also  cooperate  with  other  resources  by  operating 
according  to  the  policies  of  the  high-level  operating  system.  Cooperative 
autonomy  is  an  essential  prerequisite  for  systems  to  have  fault  tolerance  and 
high  degrees  of  extensibility  [Ensl78].  It  is  perhaps  the  most  important  and 
most  distinguishing  characteristic  of  a  fully  distributed  processing  system. 

2.2  CHARACTERIZATION  QL  DISTRIBUTED  MB.  DECENTRALIZED  CONTROL 

2.2.1  General  Nature  of  FDPS  Executive  Control 

The  executive  control  is  responsible  for  managing  the  resources  of  the 
FDPS.  Its  charter  is  to  perform  the  management  function  in  such  a  manner  that 
the  resources  of  the  FDPS  are  unified  and  users  of  the  FDPS  are  shielded  from 
the  physical  realities  of  distribution.  In  other  words,  the  executive  control 
provides  system  transparency  for  the  user. 

The  executive  control  of  an  FDPS  can  be  implemented  in  many  different 
ways.  It  can  consist  of  identical  modules  replicated  on  all  nodes  of  the 
system.  Alternatively,  it  can  consist  of  several  unique  modules  distributed 
in  some  manner  about  the  system.  The  essential  point  is  that  the  term 
"executive  control"  does  not  necessarily  mean  a  particular  module  at  a 
particular  node,  but  rather  the  entire  collection  of  modules  that  are 
distributed  somehow  throughout  the  system  and  are  working  together  to  manage 
the  system's  resources. 

2.2.2  Control  Prohl—e  Resulting  from  Mfi.  PEPS  EaVlrQBMiat 

Several  characteristics  of  an  FDPS  are  found  to  direotly  impact  the 
design  and  implementation  of  the  executive  oontrol.  These  include  system 
transparency  to  the  user,  extremely  loose  physioal  and  logical  coupling,  and 
cooperative  autonomy  as  the  basic  mode  of  component  interaction.  System 
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transparency  means  that  the  FIPS  appears  to  a  user  as  a  large  uniprocessor 
which  has  available  a  variety  of  services.  It  must  be  possible  for  the  user 
to  obtain  these  services  by  naming  them  without  specifying  any  information 
concerning  the  details  of  their  physical  location.  The  task  of  locating  all 
appropriate  instances  (copies)  of  a  particular  resource  and  choosing  the 
instance  to  be  utilized  is  left  to  the  executive  control. 

"Cooperative  autonomy"  is  another  characteristic  of  an  FDPS  that  has  a 
large  effect  on  the  design  of  the  executive  control.  The  "lower-level" 
control  functions  of  both  the  logical  and  physical  resource  components  of  an 
FDPS  are  designed  to  operate  in  a  "cooperatively  autonomous”  fashion.  Thus, 
the  executive  control  must  be  designed  such  that  any  resource  is  able  to 
refuse  a  request  even  though  it  may  have  physically  accepted  the  message 
containing  that  request.  Degeneration  into  total  anarchy  is  prevented  by  the 
establishment  of  a  common  set  of  criteria  to  be  followed  by  all  resources  in 
determining  whether  a  request  is  accepted  and  serviced  as  originally 
presented,  accepted  only  after  bidding  or  negotiation,  or  rejected. 

Another  important  FDPS  characteristic  that  definitely  affects  the  design 
of  its  executive  control  is  the  extremely  loose  coupling  of  both  physical  and 
logical  resources.  The  components  of  an  FDPS  are  connected  by  communication 
paths  of  relatively  low  bandwidth.  The  direct  sharing  of  primary  memory 
between  processors  is  not  acceptable.  Even  though  the  logical  coupling  could 
still  be  loose  with  this  physical  Interconnection  mechanism,  the  presence  of  a 
single  critical  hardware  element,  the  shared  memory,  would  create  fault- 
tolerance  limitations.  Therefore,  all  communication  takes  place  over  "stan¬ 
dard"  input/output  paths.  The  actual  data  rates  that  can  be  supported  are 
primarily  a  function  of  the  interconnections  between  the  processors  and  the 
capability  of  their  input/output  paths.  The  available  transfer  rates  are  much 
less  than  memory  transfer  rates.  This  implies  that  the  sharing  of  control 
information  among  components  on  different  processors  is  greatly  restricted. 
System  control  is  forced  to  work  with  information  that  is  "out-of-date"  and, 
as  a  result,  perhaps  "inaccurate." 

The  control  of  an  FDPS  requires  the  participation  and  cooperation  of 
components  at  all  layers  of  the  system.  This  implies  that  there  are  elements 
of  FDPS  control  present  in  the  lowest  levels  of  the  hardware  and  software  com- 
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ponents.  This  study  is  primarily  interested  in  the  software  components  of  the 
FDPS  control  which  are  typically  referred  to  as  "the  executive  control."  Low- 
level  aspects  of  FDPS  control  will  not  be  directly  examined. 

The  executive  control  is  responsible  for  managing  the  physical  and 
logical  resources  of  a  system.  It  accepts  user  requests  and  obtains  and 
schedules  the  resources  necessary  to  satisfy  a  user's  needs.  The  manner  in 
which  these  tasks  are  accomplished  is  designed  to  unify  the  distributed  com¬ 
ponents  of  the  system  into  a  whole  and  provide  system  transparency  to  the 
user. 

2.2.3  XhZ  JSfli.  Centralized  Control? 

Why  is  a  centralized  method  of  control  not  appropriate?  In  systems 
utilizing  a  centralized  executive  control,  all  of  the  control  processes  share 
a  single,  coherent,  and  accurate  view  of  the  entire  system  state.  An  FDPS, 
though,  contains  only  loosely-coupled  components,  the  communication  between 
which  is  limited  and  subject  to  variable  time  delays.  This  means  that  one 
cannot  guarantee  that  all  control  processes  will  have  the  same  view  of  the 
system  state  [Jens78].  In  fact,  it  is  a  significant  characteristic  of  an  FDPS 
that  all  control  processes  will  probably  not  have  a  consistent  view. 

A  centralized  executive  control  weakens  the  fault-tolerance  of  the 
overall  system  due  to  the  existence  of  a  single  critical  element,  the 
executive  control  component  itself.  This  obstacle,  though,  is  not 
insurmountable.  Strategies  do  exist  for  providing  fault- tolerance  in 
centralized  applications.  Gareia-Molina  [Garc79J,  for  example,  has  described 
a  scheme  for  providing  fault- tolerance  in  a  distributed  data  base  management 
system  with  a  centralized  control.  Approaches  of  this  type  typically  assume 
that  failures  are  extremely  rare  events  and  that  the  system  can  tolerate  the 
dedication  of  a  relatively  long  interval  of  time  to  reconfiguration.  These 
restrictions  may  be  unacceptable  in  an  FDPS  environment  in  which  it  is 
important  to  provide  fault-tolerance  with  a  minimum  of  disruption  to  the  ser¬ 
vices  being  supported. 

Also,  the  extremely  important  issue  of  overall  system  performance  must 
be  considered.  A  distributed  processing  system  is  expected  to  utilize  a  large 
quantity  and  a  wide  variety  of  resources.  If  a  completely  centralized 
executive  oontrol  is  implemented,  there  is  a  high  probability  that  a 
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bottleneck  will  be  created  in  the  node  executing  the  control  functions.  A 
distributed  and  decentralized  approach  to  control  attempts  to  remove  this  bot¬ 
tleneck  by  dispersing  the  control  decisions  among  multiple  components  on 
different  nodes. 

2*2.4  Distributed  m.  Decentralized 

The  discussion  above  supports  the  requirement  that  the  executive  control 
of  an  FDPS  must  be  both  "distributed"  and  "decentralized,"  and  it  should  be 
noted  that  there  is  a  clear  distinction  between  the  terms  "distributed 
control"  and  "decentralized  control"  as  they  are  used  in  the  context  of  this 
project.  "Distributed  control"  is  characterized  by  having  its  executing 
COfflPPIienta  Physically  located  £D  different  nodes.  This  means  there  are 
mltlplc  loci  qL  control  activity.  In  "decentralized  control. "  on  the  other 
hand,  control  decisions  are  made  independently  M.  separate  components.  In 
other  words,  there  are  multiple  loci  of  control  decision  malfing.  Thus, 
distributed  and  decentralized  control  has  active  components  located  on 
different  nodes,  and  those  components  are  capable  of  making  independent 
control  decisions. 

2.2.5  Rationale  Behind  Dlatrlbttted  and.  Decentralised  Control 

The  reasons  for  distributing  and  decentralizing  control  result  from  two 
basic  goals  of  an  FDPS,  to  improve  performance  and  to  provide  a  more  fault- 
tolerant  system.  With  decentralized  decision  making,  a  system  can  potentially 
provide  responses  to  requests  in  a  shorter  amount  of  time  due  to  the  increased 
utilization  of  resources  which  is  achieved  through  the  concurrent  execution  of 
the  decentralized  decision  makers. 

By  physically  distributing  components,  one  is  assured  that  a  system 
retains  the  potential  to  keep  running  even  though  some  parts  have  been  lost. 
The  ability  to  function  independently  of  the  lost  components  is  provided  by 
decentralized  decision  making.  Thus,  by  distributing  components  and 
decentralizing  decision  making,  the  potential  for  fault- tolerant  operation  is 
provided. 

2.3  mLPAIIQN  £LM 

The  steps  performed  in  the  evaluation  of  the  models  of  control  are  as 
follows: 
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1.  Prepare  detailed  definitions  of  the  models  of  control. 

2.  Construct  an  FDPS  simulator. 

3.  Perform  the  simulation  experiments. 

4.  Validate  the  control  models. 

5.  Compare  the  relative  performance  data  for  the  different  control 
models. 

2.3.1  Definition  stL  flafccal  Maflala 

The  first  step  in  the  evaluation  process  is  to  define  in  greater  detail 
the  models  of  control  originally  described  in  [Ensl8l],  One  of  the  goals  of 
the  present  research  is  to  validate  the  control  models  in  order  to  examine 
their  performance  in  certain  environments.  By  looking  at  the  finer  details  of 
the  models,  significant  control  problems  have  been  discovered  which  were  not 
apparent  from  earlier  high  level  studies. 

To  accomplish  this  detailed  study,  the  models  are  translated  into  a  high 
level  programming  language,  Pascal.  The  resulting  code  is  presented  in  Appen¬ 
dix  1  in  the  form  of  pseudo  code.  The  pseudo  code  is  derived  from  the  actual 
Pascal  code  and  is  presented  in  place  of  the  actual  code  in  order  to  conserve 
space. 

2.3.2  Construction  of  an  FDPS  Simulator 

In  order  to  perform  both  validation  and  performance  analysis  it  is 
necessary  to  construct  an  FDPS  simulator.  The  models  of  control  are 
translated  into  Pascal,  and  the  resulting  code  is  incorporated  into  the 
simulator.  Validation  is  accomplished  by  constructing  various  test  cases 
which  are  designed  to  exercise  the  particular  executive  control  functions 
being  tested.  A  detailed  transaction  log  is  maintained  in  order  to  follow  the 
actions  of  the  simulator,  and,  thus,  verify  the  correct  or  incorrect  per¬ 
formance  of  each  portion  of  the  executive  control. 

The  simulator  also  collects  various  performance  measurements.  These  are 
processed  at  the  termination  of  the  experiment  in  order  to  generate  per¬ 
formance  statistics.  The  interval  during  which  measurements  are  collected  is 
user  controllable.  This  allows  one  to  measure  steady  state  values  as  well  as 
performance  during  startup. 
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2.3.3  Simulation  Experiments 

Simulation  experiments  are  conducted  in  two  phases.  The  first  phase  is 
designed  to  validate  the  various  models  of  control.  In  these  experiments, 
there  is  no  need  to  collect  performance  measurements;  instead,  a  detailed  log 
of  the  simulator’s  actions  is  maintained.  This  is  then  analyzed  in  order  to 
observe  the  behavior  of  the  control  model  under  test. 

In  the  second  phase  of  experiments,  performance  measurements  are  collec¬ 
ted,  but  no  transaction  log  is  maintained.  These  experiments  are  used  to 
obtain  data  concerning  the  relative  performance  of  the  various  models  of 
control.  In  order  to  obtain  steady  state  data,  measurements  are  not  collected 
until  some  time  after  startup.  Several  simulations  are  performed  on  each 
model  of  control.  Each  simulation  provides  the  control  with  a  different 
environment.  To  obtain  different  environments,  the  interconnection  topology 
and  the  bandwidths  of  the  communication  links  are  varied. 

The  load  for  the  simulator  is  generated  in  the  following  manner.  The 
user  specified  configuration  determines  the  number  of  nodes,  the  connectivity 
of  these  nodes,  the  number  of  terminals  attached  to  each  node,  and  the  initial 
state  of  the  file  system.  The  file  system  includes  data  files,  command  files, 
and  object  files.  Each  object  file  specifies  a  script  of  actions  to  be 
simulated  in  order  to  simulate  the  execution  of  a  user  process.  The  user  of 
the  simulator  provides  a  series  of  commands  that  can  originate  from  a 
terminal.  These  commands  form  a  population  of  commands  from  which  the  load 
generator  randomly  selects  commands  for  arrival  from  specific  terminals.  The 
time  of  command  arrival  is  determined  by  generating  a  random  number  from  a 
particular  interval  marked  by  a  minimum  and  a  maximum  time  delay  between  sub¬ 
mission  of  commands. 

2.3.4  Validation  al  control  Models 

Validation  of  the  models  of  control  is  achieved  by  constructing  input 
scripts  designed  to  excercise  the  particular  executive  control  being  tested. 
The  resulting  transaction  log  is  analyzed  to  insure  the  correct  performance  of 
the  executive  control. 

2.3.5  Comparison  si  Relative  Perfnrunne  si  Jtfcfl.  Models 

After  each  test,  the  data  reduction  portion  of  the  simulator  utilizes 
the  performance  measurements  gathered  during  the  specified  interval  of  time  to 
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compute  the  following  statistics: 

1.  The  average  service  time  for  a  user  session,  for  a  work 

request,  and  for  a  process.  (This  is  computed  for  all  nodes 

and  also  averaged  over  all  nodes.) 

2.  The  average  response  time  for  a  user  session,  for  a  work 

request,  and  for  a  process.  (This  is  computed  for  all  nodes 

and  also  averaged  over  all  nodes.) 

3.  The  throughput  for  user  sessions,  for  work  requests,  and  for 
processes.  (This  is  computed  for  all  nodes  and  also  averaged 
over  all  nodes.) 

4.  For  the  READY  QUEUE  on  each  node,  the  MESSAGE  BLOCKED  QUEUE  on 
each  node,  each  DISK  WAITING  QUEUE  on  each  node,  and  each  LINK 
QUEUE  on  each  node  the  following  statistics  are  compiled: 

a.  The  minimum  time  spent  by  a  process  in  the 

queue. 

b.  The  maximum  time  spent  by  a  process  in  the 
queue. 

c.  The  average  time  spent  by  a  process  in  the 

queue . 

d.  The  minimum  queue  length  observed  by  a  process 
entering  the  queue. 

e.  The  maximum  queue  length  observed  by  a  process 
entering  the  queue. 

f.  The  average  queue  length  observed  by  a  process 
entering  the  queue. 

5.  The  number  of  user  messages,  control  messages,  and  the  total 
number  of  messages  sent  from  each  node  to  every  other  node. 

6.  The  number  of  user  messages,  control  messages,  and  the  total 
number  of  messages  sent  on  each  link. 

Utilizing  these  statistics,  conclusions  concerning  the  relative  merits 
of  each  of  the  models  of  control  are  made. 

2.4  PROJECT  SCOPE  AM  ORGANIZATION  Q£_  ISIS.  REPORT 

Following  these  first  two  sections  of  introductory  remarks,  this  paper 
examines  in  finer  detail  the  models  initially  presented  in  [Ensl8l].  Section 
3  contains  a  description  of  the  more  important  features  of  the  control  models 
under  examination.  A  pseudo  code  description  of  these  models  is  provided  in 
Appendix  1. 

The  simulator  used  in  the  evaluation  of  the  models  is  the  topic  of 
discussion  in  Section  4.  In  this  section,  the  goals  of  the  simulation 
experiments,  requirements  for  the  simulator,  and  the  structure  of  the 
simulator  are  discussed. 
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In  Section  5,  the  results  of  the  simulation  experiments  are  examined. 
This  Includes  discussions  of  both  the  validity  of  the  models  in  certain 
environments  and  the  relative  performance  of  the  various  models  of  control. 

Conclusions  about  the  results  of  the  evaluation  studies  are  presented  in 
Section  6.  The  results  of  these  experiments  are  summarized  and  placed  into 
proper  perspective  and  further  questions  that  this  study  stimulated  but  failed 
to  answer  are  identified. 


Section  3 


MODELS  OF  CONTROL 


Page  17 


SECTION  3 
MODELS  OF  CONTROL 

This  research  considers  six  different  models  of  control.  These  models 
are  described  in  general  terms  in  this  section,  and  pseudo  code  for  the  models 
is  provided  in  Appendix  1 .  The  models  are  similar  in  many  respects  differing 
usually  only  in  some  particular  aspect  of  control.  Therefore,  only  the  first 
model  is  presented  completely.  The  others  are  described  by  indicating  how 
they  differ  from  the  first  model. 

3.1  THE  XFDPS.1  CONTROL  MODEL 

The  XFDPS.1  control  model  was  first  defined  in  [Sapo80]  and  further 
refined  in  [Ensl8l].  With  the  aid  of  a  simulation  environment,  this  model  has 
been  even  more  completely  defined.  The  XFDPS.1  model  is  composed  of  six  types 
of  components:  TASK  SET  MANAGERS,  FILE  SYSTEM  MANAGERS,  FILE  SET  MANAGERS, 
PROCESSOR  UTILIZATION  MANAGERS,  PROCESSOR  UTILIZATION  MONITORS,  and  PROCESS 
MANAGERS.  (See  Figure  1.)  The  basic  strategy  of  this  model  of  control  is  to 
partition  the  system’s  resources  and  assign  separate  components  to  manage  each 
partition. 

3.1.1  Task  Set  Manager 

A  TASK  SET  MANAGER  is  assigned  to  each  user  terminal  as  well  as  to  each 
executing  command  file.  The  name  TASK  SET  MANAGER  results  from  the  nature  of 
user  work  requests  which  originate  from  user  terminals  and  command  files.  The 
work  requests  specify  one  or  more  executable  files  called  tasks  ( these  contain 
either  object  code  or  commands)  and  any  input  or  output  files  used  by  the 
tasks.  It  is  possible  for  the  tasks  of  a  work  request  to  communicate,  and 
this  communication  (task  connectivity)  is  also  described  by  the  work  request. 
Therefore,  each  work  request  specifies  a  set  of  tasks,  and  it  is  the  job  of 
the  TASK  SET  MANAGER  to  control  the  execution  of  that  set  of  tasks. 

When  a  work  request  arrives,  the  TASK  SET  MANAGER  parses  the  work 
request  and  initiates  construction  of  the  task  graph  for  this  work  request. 

In  XFDPS.1,  only  a  single  copy  of  the  task  graph  is  maintained.  This  copy  is 
stored  at  the  node  where  the  TASK  SET  MANAGER  for  the  work  request  resides. 

At  this  stage  of  work  request  processing,  the  task  graph  contains  the  initial 
resource  requirements  for  the  work  request.  __  _ _ 

r 
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Figure  1.  The 

XFDPS.1  Model  of  Control 

In  the  next  step,  a  message  is  sent  to  the  FILE  SYSTEM  MANAGER  residing 
on  the  same  node  as  the  TASK  SET  MANAGER  requesting  file  availability  informa¬ 
tion  concerning  the  files  needed  by  the  work  request.  A  message  is  also  sent 
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to  the  PROCESSOR  UTILIZATION  MANAGER  residing  on  the  sane  node  as  the  TASK  SET 
MANAGER  requesting  processor  utilization  information.  This  includes  the 
latest  utilization  information  that  this  particular  node  has  obtained  from  all 
other  nodes. 

When  the  file  availability  information  and  processor  utilization 
information  arrive,  a  work  distribution  and  resource  allocation  decision  is 
made  by  the  TASK  SET  MANAGER.  At  this  point,  specific  files  are  chosen  from 
the  list  of  files  found  available  and  specific  processors  are  chosen  as  sites 
for  the  execution  of  the  various  tasks  of  the  work  request's  task  set.  In 
this  study  no  attempt  is  made  to  investigate  different  strategies  for 
distributing  work;  instead,  a  single  strategy  is  used  for  all  experiments. 
(Other  work  in  progress  in  the  FDPS  Research  Program  at  Georgia  Tech  is 
examining  the  complete  area  of  work  distribution  and  resource  allocation.)  In 
this  strategy,  a  process  is  assigned  to  execute  on  the  same  node  that  its 
object  code  resides.  Data  files  are  not  moved  but  accessed  from  the  node  on 
which  they  originally  resided. 

Once  the  allocation  decision  is  made,  a  request  for  the  locking  of  the 
chosen  files  is  sent  by  the  TASK  SET  MANAGER  to  the  FILE  SYSTEM  MANAGER  resid¬ 
ing  on  the  same  node  as  the  TASK  SET  MANAGER.  The  desired  type  of  access 
(READ  or  WRITE)  is  also  passed  along  with  the  lock  request.  Multiple  readers 
are  permitted,  but  readers  are  denied  access  to  files  already  locked  for 
writing,  and  writers  are  denied  access  to  files  locked  for  reading  or  writing. 
If  the  FILE  SYSTEM  MANAGER  informs  the  TASK  SET  MANAGER  that  all  the  desired 
files  have  been  successfully  locked,  execution  of  the  work  request  can  be 
initiated.  If  the  locking  operation  is  not  successful,  the  work  request  is 
aborted,  and  the  necessary  cleanup  operations  are  performed.  The  next  step 
after  successful  file  allocation  is  to  send  a  series  of  messages  to  the 
PROCESS  MANAGERS  on  the  various  nodes  that  have  been  chosen  to  execute  the 
tasks  of  the  task  set  informing  them  that  they  are  to  execute  a  specific  sub¬ 
set  of  tasks. 

When  a  task  terminates,  its  PROCESS  MANAGER  reports  back  to  the  TASK  SET 
MANAGER  and  Indicates  the  reason  for  the  termination  (normal  or  abnormal). 
When  an  indication  of  an  abnormal  termination  is  received,  the  remaining 
active  tasks  of  the  task  set  are  terminated. 
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After  all  tasks  of  a  task  set  have  terminated,  one  of  three  possible 
actions  occurs.  If  the  source  of  commands  is  a  user  terminal,  the  user  is 
prompted  for  a  new  command.  If  the  source  is  a  command  file,  the  next  command 
is  obtained.  Finally,  if  the  source  is  a  command  file  and  all  the  commands 
have  been  executed,  the  TASK  SET  MANAGER  is  deactivated  and  the  PROCESS 
MANAGER  on  the  node  where  the  command  file  was  being  executed  is  informed  of 
the  termination  of  the  command  file. 

3.1 .2  File  System  Manager 

Replicated  on  each  node  of  the  system  is  a  component  called  the  FILE 
SYSTEM  MANAGER.  This  module  handles  the  file  system  requests  from  all  of  the 
TASK  SET  MANAGERS  including  requests  for  file  availability  information  and 
requests  to  lock  or  release  files.  FILE  SYSTEM  MANAGERS  do  not  possess  any 
directory  information.  Therefore,  to  locate  a  file,  it  is  necessary  that  all 
nodes  are  queried  as  to  the  availability  of  the  file. 

The  FILE  SYSTEM  MANAGER  satisfies  the  requests  by  consulting  with  the 
FILE  SET  MANAGERS  (see  Section  3*1*3)  located  on  each  node  of  the  system.  For 
example,  when  the  FILE  SYSTEM  MANAGER  receives  a  request  for  file  availability 
information,  messages  are  prepared  and  sent  to  all  FILE  SET  MANAGERS.  The 
FILE  SYSTEM  MANAGER  collects  the  responses,  and  when  responses  from  all  FILE 
SET  MANAGERS  have  been  obtained,  it  reports  the  results  to  the  TASK  SET 
MANAGER  which  made  the  request.  Requests  for  the  locking  or  releasing  of 
files  are  handled  in  a  similar  manner. 

3.1.3  File  Set  Manager 

The  files  residing  on  each  node  of  the  system  are  managed  separately 
from  the  files  on  other  nodes  by  a  FILE  SET  MANAGER  that  is  dedicated  to 
managing  that  set  of  files.  The  duties  of  the  FILE  SET  MANAGER  include 
providing  file  availability  information  to  inquiring  FILE  SYSTEM  MANAGERS  and 
reserving,  locking,  and  releasing  files  as  requested  by  FILE  SYSTEM  MANAGERS. 
It  should  be  noted  that  a  side  effect  of  gathering  file  availability  informa¬ 
tion  is  the  placement  of  a  reservation  on  a  file  that  is  found  to  be 
available. 

3.1. Process  Utilisation  Manager 

Also  present  on  each  node  is  another  component  of  the  executive  control, 
the  PROCESSOR  UTILIZATION  MANAGER.  This  module  is  assigned  the  task  of  col- 
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lecting  and  storing  processor  utilization  information  which  is  obtained  from 
the  PROCESSOR  UTILIZATION  MONITORS  (see  Section  3.1.5)  residing  on  each  of  the 
nodes.  When  a  TASK  SET  MANAGER  asks  the  PROCESSOR  UTILIZATION  MANAGER  for 
utilization  information,  the  PROCESSOR  UTILIZATION  MANAGER  responds  with  the 
data  available  at  the  time  of  the  query. 

3.1.5  fXQQaaaor  utilization  Monitor 

Each  node  of  the  system  also  has  a  PROCESSOR  UTILIZATION  MONITOR  that  is 
responsible  for  collecting  various  measurements  needed  to  arrive  at  a  value 
describing  the  current  utilization  of  the  processor  on  which  the  PROCESSOR 
UTILIZATION  MONITOR  resides.  The  processor  utilization  value  is  periodically 
transmitted  to  the  PROCESSOR  UTILIZATION  MANAGERS  on  all  nodes. 

3.1.6  Process  Manager 

Residing  on  each  node  of  the  system  is  a  PROCESS  MANAGER  whose  function 
is  to  supervise  the  execution  of  processes  executing  on  the  node  on  which  it 
resides.  The  PROCESS  MANAGER  is  responsible  for  activating  and  deactivating 
processes.  If  the  execution  file  for  a  process  is  an  object  file,  the  PROCESS 
MANAGER  will  load  the  object  file  into  memory.  This  file  may  reside  either 
locally  or  on  a  distant  node.  If  the  execution  file  is  a  command  file,  the 
PROCESS  MANAGER  sees  that  a  TASK  SET  MANAGER  is  activated  to  respond  to  the 
commands  of  that  command  file.  The  PROCESS  MANAGER  is  also  responsible  for 
handling  process  termination.  This  involves  releasing  local  resources  held  by 
the  process  and  informing  the  TASK  SET  MANAGER  that  requested  the  execution  of 
the  process  as  to  the  termination  of  the  process. 

3.1.7  File  Process 

In  order  to  provide  file  access  in  a  manner  that  is  uniform  with  the 
operation  of  the  rest  of  the  system,  another  type  of  control  process  is 
utilized,  the  FILE  PROCESS.  For  each  access  to  a  file,  an  instance  of  a  FILE 
PROCESS  is  created.  Therefore,  if  process  "A"  is  accessing  file  "X"  and 
process  "B"  is  also  accessing  file  "X",  there  will  be  two  instances  of  a  FILE 
PROCESS,  each  responsible  for  a  particular  access  to  file  "X".  Communication 
between  FILE  PROCESSes  and  user  processes  (file  reads  and  writes)  or  between 
FILE  PROCESSes  and  PROCESS  MANAGERS  (loading  of  object  programs)  is  handled  in 
the  same  manner  as  communication  between  user  processes. 
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3.2  m  gPPS.2  CONTROL  MODEL 

The  XFDPS.2  model  of  control  differs  from  the  XFDPS. 1  model  In  the  man¬ 
ner  in  which  file  management  is  conducted.  In  this  model  a  centralized  direc¬ 
tory  is  maintained.  In  Appendix  1  the  component  named  FILE  SYSTEM  MANAGER 
maintains  this  directory.  This  component  resides  on  only  one  node,  the  node 
where  the  file  system  directory  is  maintained.  TASK  SET  MANAGERS  communicate 
directly  with  this  component  in  order  to  gain  availability  information,  lock 
files,  or  release  files. 

When  a  file  is  locked  it  is  necessary  to  create  a  FILE  PROCESS  in  order 
to  provide  access  to  the  file.  To  accomplish  this  task,  the  FILE  SYSTEM 
MANAGER  sends  a  message  to  the  node  where  the  file  resides  requesting  activa¬ 
tion  of  a  FILE  PROCESS  providing  access  to  the  file.  Once  this  process  is 
created,  the  FILE  SYSTEM  MANAGER  is  given  the  name  of  the  FILE  PROCESS  which 
it  then  returns  to  the  TASK  SET  MANAGER  that  requested  the  file  lock. 

3.3  Iffi  3EEJ2&.3.  CONTROL  mSL 

In  the  XFDPS. 1  model  of  control  a  search  for  file  availability  informa¬ 
tion  encompassing  all  nodes  is  conducted  for  each  work  request.  Obtaining 
this  global  information  is  important  when  one  is  attempting  to  obtain  optimal 
resource  allocations.  In  those  instances  where  this  is  not  important  a  slight 
variation  on  the  search  strategy  may  be  utilized.  This  strategy  is  the 
distinguishing  feature  of  the  XFDPS. 3  model  of  control. 

Instead  of  immediately  embarking  on  a  global  search,  a  search  of  local 
resources  (i.e.,  resources  that  reside  on  the  same  node  where  the  work  request 
originated)  is  conducted.  If  all  of  the  required  resources  are  located,  no 
further  searches  are  conducted,  and  the  operations  of  locking  files,  activat¬ 
ing  process,  etc.,  described  for  model  XFDPS. 1  are  executed.  If  on  the  other 
hand  all  required  resources  could  not  be  found,  the  strategy  of  model  XFDPS. 1 
is  utilized. 

3.H  XFDPS. A  CONTROL  MODEL 

The  XFDPS. A  model  of  control  utilizes  a  file  management  strategy  similar 
to  that  of  the  ARAMIS  Distributed  Computer  System  [Caba79a,b]  in  which  mul¬ 
tiple  redundant  file  system  directories  are  maintained  on  all  nodes  of  the 
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system.  However,  since  detailed  information  about  the  system  described  in 
[Caba79a,b]  is  not  available,  model  XFDPS.4  cannot  be  claimed  to  be  an 
accurate  model  of  that  system. 

To  preserve  the  consistency  of  the  redundant  copies  of  the  file  system 
directory  and  to  provide  mutually  exclusive  access  to  resources,  the  following 
steps  are  taken.  A  control  message,  the  control  vector  (CV),  is  passed  from 
node  to  node  according  to  a  predetermined  ordering  of  the  nodes.  The  holder 
of  the  CV  can  either  release,  reserve,  or  lock  files.  Therefore,  each  node 
collects  file  system  requests  and  waits  for  the  CV  to  arrive.  Once  in  posses¬ 
sion  of  the  CV,  a  node  can  perform  the  actions  necessary  to  fulfill  the 
requests  it  has  collected. 

The  modifications  to  the  file  system  directory  are  then  placed  into  a 
message  called  the  update  vector  (UPV)  which  is  passed  to  all  nodes  in  order 
to  bring  all  copies  of  the  file  system  directory  into  a  consistent  state. 
When  the  UPV  returns  to  the  node  holding  the  CV,  all  updates  have  been  recor¬ 
ded,  and  the  CV  can  be  sent  on  to  the  next  node. 

3.5  m  IH2£fi.5.  CONTROL  HQCJBL 

In  the  XFDPS.5  model,  files  are  not  reserved  when  the  initial 
availability  request  is  made,  and  they  are  locked  only  after  the  work 
distribution  and  resource  allocation  decision  has  been  made.  This  strategy 
leads  to  the  possibility  of  generating  an  allocation  plan  that  is  impossible 
to  carry  out  if  a  file  chosen  for  allocation  has  been  given  to  another  process 
during  the  interval  in  which  the  resource  allocation  decision  is  made.  In  the 
previous  models,  the  executive  control  is  assured  of  an  allocation  being 
accepted,  assuming  no  component  fails. 

3.6  Iffi  JE2S&.L  CONTROL  UQ££L 

In  the  XFDPS.1  model,  the  task  graph  for  a  particular  work  request  is 
maintained  as  a  single  unit  and  stored  on  only  one  node,  the  node  at  which  the 
work  request  originates.  The  XFDPS.6  model  of  control  utilizes  a  slightly 
different  strategy.  The  task  graph  is  constructed  on  a  single  node,  but  once 
a  work  distribution  and  resource  allocation  decision  has  been  made,  portions 
of  the  task  graph  are  sent  to  various  nodes.  Specifically,  those  nodes  chosen 
to  execute  the  various  tasks  of  the  task  graph  are  given  that  portion  of  the 
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task  graph  for  which  they  are  responsible.  Each  node,  then,  must  activate  the 
tasks  assigned  to  it  and  collect  termination  information  concerning  those 
tasks.  When  all  tasks  assigned  to  a  particular  node  have  terminated,  the  node 
where  the  work  request  originally  arrived  is  informed  of  their  termination. 
One  can  view  this  strategy  as  a  two-level  hierarchy. 
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SECTIOI  4 
THE  SIMULATOR 

In  order  to  obtain  quantitative  information  concerning  the  relative  per¬ 
formance  of  the  various  models  of  control,  simulation  experiments  are  conduc¬ 
ted.  The  goals  of  these  experiments  are  to  validate  the  models  of  control 
described  in  Section  3  and  gather  data  on  their  relative  performance.  In 
order  to  be  able  to  express  the  differences  between  the  various  models,  it  is 
necessary  that  the  simulator  provide  for  the  specification  of  relatively  low 
level  features  of  the  control  models. 

4.1  REQUIREMEMTS  FOR  THE  SIMULATOR 

The  goals  described  above  necessitate  the  establishment  of  several 
requirements  for  the  simulator.  In  order  to  handle  low  level  control  problems 
and  document  solutions  to  these  problems,  the  control  models  must  be  defined 
in  a  language  capable  of  clearly  expressing  the  level  of  detail  required  at 
this  stage  of  design.  Because  a  number  of  models  are  to  be  tested,  it  is 
Important  that  the  coding  effort  for  these  models  be  minimized. 

It  is  expected  that  the  architecture  of  the  network  as  well  as  that  of 
individual  nodes  in  the  network  will  affect  the  relative  performance  of 
various  control  models.  Therefore,  one  must  be  able  to  easily  modify  various 
architectural  attributes.  This  includes  network  connectivity,  network  link 
capacities,  and  the  capacities  and  processing  speeds  of  the  individual  nodes 
of  the  network. 

Validation  of  control  models  is  one  of  the  primary  goals  of  the  simula¬ 
tion  studies.  To  achieve  this  goal  the  simulator  must  provide  the  ability  to 
establish  specific  system  states.  In  other  words,  specific  detailed  instances 
of  work  requests  need  to  be  constructed  along  with  the  establishment  of 
specific  resource  states  (e.g. ,  one  must  be  able  to  set  up  a  series  of  files 
in  specific  locations).  These  capabilities  allow  one  to  exercise  specific 
features  of  the  control  models. 

The  simulation  studies  also  provide  performance  information.  The 
simulator  must  utilize  a  technique  for  generating  work  requests  reflecting 
specific  distributions.  It  also  needs  to  collect  a  variety  of  performance 
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measurements  and  generate  appropriate  statistical  results. 

4.2  THE  STRUCTURE  OF  THE  SIMULATOR 

The  simulator  is  event  based  and  programmed  in  Pascal.  It  simulates  the 
hardware  components  of  an  FDPS,  functions  typically  provided  by  local  operat¬ 
ing  systems,  functions  provided  by  a  distributed  and  decentralized  control, 
and  the  load  placed  upon  the  system  by  users  attached  to  the  system  through 
terminals. 

4.2.1  Architecture  Simulated 

The  hardware  organization  that  is  simulated  is  depicted  in  Figure  2. 
The  complete  system  consists  of  a  number  of  nodes  connected  by  half-duplex 
communication  links.  Each  node  contains  a  CPU,  a  communications  controller, 
and  perhaps  a  number  of  disks.  Connected  to  each  node  are  a  number  of  user 
terminals.  The  disk  simulation  is  such  that  no  actual  information  is  stored; 
only  the  delays  experienced  in  performing  disk  input/output  are  considered. 
User  interprocess  communication  (IPC)  is  simulated  with  time  delays  but  no 
exchange  of  real  data  takes  place.  However,  IPC  between  components  of  the 
executive  control  involves  both  simulation  of  the  time  delays  involved  in  mes¬ 
sage  transfer  and  the  actual  transfer  of  control  information  to  another 
simulated  node. 

4.2.2  Local  Operating  System 

Components  typically  found  in  local  operating  systems  are  also 
simulated.  These  include  the  dispatcher  and  the  device  drivers.  The  local 
operating  systems  are  multitasking  systems  with  each  node  capable  of  utilizing 
a  different  time  slice.  User  processes  are  serviced  in  a  first  come  first 
served  manner  and  can  be  interrupted  for  any  of  the  following  reasons:  1)  a 
control  process  needs  to  execute  (user  process  is  delayed  until  the  control 
process  releases  the  processor),  2)  the  user  process  exhausts  its  time  slice 
(user  process  is  placed  at  the  end  of  the  READY  QUEUE),  3)  the  user  process 
attempts  to  send  or  receive  a  message  (user  process  is  placed  on  the  MESSAGE 
BLOCKED  QUEUE),  or  4)  the  user  process  terminates. 

The  processes  serviced  by  the  simulator  are  capable  of  performing  the 
following  actions:  compute,  send  a  message,  receive  a  message,  or  terminate. 
A  process  can  access  a  file  by  communicating  with  a  FILE  PROCESS  which  is 
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Figure  2.  The  Architecture  Supported  by  the  Simulator  for  Each  Node 


activated  for  the  specific  purpose  of  providing  access  to  the  file  for  this 
process.  FILE  PROCESSes  are  the  only  processes  that  initiate  any  disk 
activity.  As  far  as  a  user  process  is  concerned,  a  file  access  is  simply  a 
communication  with  another  process. 

The  following  process  queues  are  maintained:  READY  QUEUE,  DISK  WAITING 
QUEUE,  and  MESSAGE  BLOCKED  QUEUE.  (See  Figure  3.)  A  newly  activated  process 
is  placed  in  the  READY  QUEUE.  The  DISPATCHER  selects  a  process  from  the  READY 
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QUEUE  to  run  on  the  CPU.  If  the  running  process  exhausts  its  time  slice,  it 
is  returned  to  the  READY  QUEUE.  If  it  either  attempts  to  send  or  receive  a 
message,  it  is  placed  in  the  MESSAGE  BLOCKED  QUEUE  where  it  remains  until 
either  the  message  is  placed  in  the  proper  link  queue  (send  operation)  or  a 
message  is  received  (receive  operation).  After  leaving  the  MESSAGE  BLOCKED 
QUEUE,  a  process  returns  to  the  READY  QUEUE. 

The  only  processes  capable  of  performing  disk  input/output  on  the 
simulator  are  FILE  PROCESSes.  These  are  executive  control  processes  that  are 
assigned  to  provide  access  to  the  files  of  the  file  system.  When  a  file 
process  attempts  a  disk  access,  it  is  blocked  and  placed  in  the  DISK  WAITING 
QUEUE  for  processes  waiting  to  access  that  same  disk.  As  the  disk  requests 
are  satisfied,  these  processes  are  returned  to  the  READY  QUEUE. 

4.2.3  Message  System 

The  communication  system  consists  of  a  series  of  half-duplex  connections 
between  pairs  of  nodes.  Messages  are  transmitted  using  a  store-and-forward 
method.  Messages  received  at  intermediate  nodes  in  a  path  are  stored  and  for¬ 
warded  to  the  next  node  at  a  time  dictated  by  the  communication  policy  being 
utilized.  For  example,  the  policy  may  require  that  the  new  message  be  placed 
at  the  end  of  the  queue  of  all  messages  to  be  transmitted  on  a  particular 
link.  (This  is  the  policy  utilized  in  all  experiments.) 

The  message  queues  available  on  each  node  are  depicted  in  Figure  4.  If 
a  newly  created  message  is  an  intranode  message,  it  is  placed  in  the  MESSAGE 
QUEUE;  otherwise,  it  is  placed  in  the  LINK  QUEUE  that  corresponds  to  the  com¬ 
munication  link  over  which  the  message  is  to  be  transmitted.  Messages  are 
removed  from  the  LINK  QUEUES  and  transmitted  as  the  communication . links  become 
available. 

Messages  in  the  MESSAGE  QUEUE  originate  either  from  processes  sending 
intranode  messages  or  from  the  communication  links  conneoted  to  the  node. 
Messages  destined  for  processes  on  the  same  node  as  the  MESSAGE  QUEUE  are 
placed  in  the  appropriate  PORT  QUEUE  of  the  process  to  which  they  are  addres¬ 
sed.  Messages  that  have  not  yet  reached  their  destination  are  plaoed  in  the 
LINK  QUEUE  corresponding  to  the  communication  link  over  which  the  message  is 
to  be  transmitted. 
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Figure  3 •  Prooess  Queues  on  Each  Node 
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Figure  4.  Message  QUEUES  on  Each  Node 
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4.2.4  Input  for  the  Simulator 

The  simulator  requires  the  following  six  types  of  input: 

1 .  Control  model 

2.  Network  configuration  (i.e,,  nodes  and  their  connectivity) 

3.  Work  requests 

4.  Command  files 

5.  Object  files 

6.  Data  files 

The  nature  of  these  inputs  and  how  they  are  provided  to  the  simulator  is 
described  below. 

4.2.4. 1  Control  Model 

There  are  two  possible  approaches  for  representing  the  control  model  in 
the  simulator:  1)  data  to  be  interpreted  by  the  simulator  and  2)  code  that  is 
actually  part  of  the  simulator.  The  first  technique  requires  that  the 
simulator  contain  or  include  a  rather  sophisticated  interpreter  in  order  to 
provide  a  convenient  language  with  which  one  can  express  a  control  model  that 
addresses  the  control  problems  to  a  sufficiently  low  level  of  detail.  The 
second  technique  requires  the  careful  construction  of  the  simulator  such  that 
those  portions  of  the  simulator  that  express  the  control  model  are  easily 
identified  and  can  be  removed  and  modified  with  minimal  effort.  The  second 
technique  also  requires  a  recompilation  of  the  simulator  code  each  time  a 
control  model  modification  is  performed. 

The  problems  involved  in  constructing  a  sophisticated  interpreter  are 
much  greater  than  those  faced  in  organizing  the  simulator  so  that  the  portions 
of  code  expressing  the  control  model  are  easily  isolated.  Therefore,  in  this 
simulator,  the  control  models  are  expressed  in  Pascal  and  are  actually  part  of 
the  simulator  rather  than  being  separate  input  to  the  simulator. 

4. 2. 4. 2  Network  Configuration 

The  attributes  provided  as  input  to  the  simulator  which  are  concerned 
with  the  physical  configuration  of  the  FDPS  are  provided  in  Table  2.  Figure  5 
describes  the  syntax  of  the  statements  used  to  enter  the  FDPS  configuration 
information.  Two  types  of  input  can  be  provided,  node  configuration  informa¬ 
tion  and  communication  linkage  information.  Each  statement  beginning  with  the 
letter  ’n’  describes  the  configuration  of  the  node  which  is  identified  by  the 
digit  following  the  '  n'.  This  statement  describes  certain  characteristics 
concerning  the  processor  at  the  node  (memory  capacity,  processing  speed,  and 
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the  length  of  a  user  time  slice)  and  the  peripheral  devices  (user  terminals 
and  disks)  attached  to  the  processor.  Each  statement  beginning  with  the  let¬ 
ter  '1'  describes  a  half-duplex  communication  link  between  two  nodes.  It 
identifies  the  source  and  destination  nodes  by  their  identification  number 
(the  digit  following  the  letter  'n'  on  statements  describing  nodes)  and 
indicates  the  effective  bandwidth  of  the  communication  link.  It  is  assumed 
that  all  messages  are  transmitted  at  this  speed,  and  no  attempt  is  made  to 
simulate  errors  in  transmission  and  the  resulting  retransmissions. 


Table  2.  Physical  Configuration  Input  to  the  Simulator 

Mode  Information 

Memory  Capacity  (bytes) 

Processing  Speed  (Instructions/ sec) 

Size  of  a  Time  Slice  (microseconds) 

Number  of  Attached  User  Terminals 

Number  of  Attached  Disks 

Disk  Transfer  Speed  (bytes/3econd) 

Average  Disk  Latency  (microseconds) 

Link  laforaatloa 

Identities  of  the  Source  and  Destination  Nodes 
Bandwidth  (bytes/second) 


4. 2. 4. 3  Work  Requests 

Work  requests  are  assumed  to  originate  from  two  sources:  1)  directly 
from  a  user,  or  2)  through  command  files.  The  syntax  of  a  work  request  is 
given  in  Figure  6.  This  syntax  is  a  subset  of  the  command  language  available 
through  the  Advanced  Command  Interpreter  of  the  Georgia  Tech  Software  Tools 
System  [Akin80]. 

A  work  request  is  basically  a  specification  of  a  logical  network  of 
tasks.  The  nodes  of  the  logical  network  represent  tasks  and  the  links 
represent  communication  paths  between  the  tasks.  A  node  specification 
includes  the  following:  an  optional  label  to  identify  the  node,  a  command 
name  (this  may  name  either  an  object  file  or  a  command  file),  and  any  I/O 
redirection.  A  node  can  be  identified  either  by  its  label,  if  it  possesses 
one,  or  by  its  position  on  the  command  line.  For  example,  in  the  command 
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<entry>  ::=  <link>  !  <node> 

<link>  ::=  1  <from>  <to>  <bandwidth>  (all  links  are  half-duplex) 

<node>  ::=  n  <node  id>  <memory>  <speed>  <timeslice>  <terminals> 

<disk>  <disk  speed>  <disk  latency> 

<from>  ::r  <node  id> 

<to>  ::=  <node  id> 

<node  id>  ::=  <integer> 

<bandwidth>  ::=  <integer  (link  bandwidth  in  bytes  per  second )> 

<memory>  <integer  (main  memory  in  bytes)> 

<speed>  ::=  <integer  (average  speed  of  the  CPU  in  instructions  per  second )> 
<timeslice>  ::=  <integer  (microseconds) > 

<terminals>  ::=  <integer  (number  of  attached  user  terminals)> 

<disk>  ::=  <integer  (number  of  attached  disks) > 

<disk  speed>  ::=  <integer  (transfer  speed  of  disk  in  bytes/sec)> 

<disk  latency>  ::=  <integer  (average  disk  latency  in  microseconds) > 
<integer>  ::=  <digit>  {  <digit>  } 


Examples: 

n  1  256000  5000000  1000  50  3  500000  100 

(Node  #1  has  250K  bytes  of  memory,  processes  at  the  rate  of 
5  MIPS,  has  a  time  slice  of  1000  microseconds,  has  50  user 
terminals  attached  to  it,  has  3  disks  attached  to  it, 
each  disk  can  transfer  at  the  rate  of  500,000  bytes/ sec, 
and  each  disk  has  an  average  latency  of  100  microseconds.) 

156  4000000 

(This  link  connects  node  5  to  node  6  with  a  half-duplex 
communication  path  that  can  transmit  at  the  rate  of 
4  million  bytes/sec.) 


* 

4 


Figure  5.  Syntax  of  FDPS  Configuration  Input  for  the  Simulator 
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<work  request>  ::=  <logical  net> 

<logical  net>  iogical  node?  {  <node  separator > 

{  <node  separator>  }  <logicaI  node>  } 

<node  separator >  ::=  ,  |  <pipe  connection> 

<pipe  connection?  [  <port>  ]  *|'  [  <logical  node  number?  ] 

[  .<port?  ] 

<port?  <integer? 

<logical  node  number?  ::=  <integer?  |  $  |  <label? 

<logical  node?  ::=  [  :<label?  ]  <simple  node? 

<simple  node?  ::=  {  <i/o  redirector?  }  <command  name? 

{  <i/o  redirector?  } 

<i/o  redirector?  <file  name?  '?•  [  <port?  ]  | 

[  <port?  ]  '?'  <file  name?  I 
[  <port?  ]  '??’  <file  name?  | 

'??'  [  <port?  ] 

<command  name?  =  <command  file  name?  I  <obJect  file  name? 

<label?  ::s  <identifier? 

<file  name?  =  <data  file  name? 

identifier?  <letter?  {  <letter?  |  <digit?  } 

integer?  <digit?  {  <digit?  } 

Examples : 

pgml  I  pgm2  1 1  a  2|b  :a  pgm3  I  pgm4  ic.1  :b  pgm5  I  pgm6  1.2  :c  pgm7 
(For  an  explanation  of  this  example  see  Figure  7.) 

Figure  6.  Work  Request  Syntax 
(Based  on  [AKIN80]) 
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below,  the  second  node  has  the  label  'a'  and  the  command  name  'cmnd2'. 
cmndl  !  :a  cmnd2 

This  node  can  be  identified  either  by  the  label  'a'  or  its  position  *2'  but 
not  by  its  name,  'cmnd2'. 

I/O  redirection  is  used  to  connect  ports  of  task  to  files  in  the  file 
system.  (The  default  for  I/O  is  "standard  input/output,"  i.e.,  the  user's 
terminal.)  In  the  example  below,  input  port  number  three  is  connected  to  file 
'in'  and  output  port  number  one  is  connected  to  file  'out'. 
in>3  cmnd  1>out 

The  specification  of  the  port  number  in  the  I/O  redirector  is  optional.  If  it 
is  omitted,  the  next  unused  port  number  is  assumed.  Therefore,  in  the  example 
below,  output  port  number  one  is  connected  to  file  'outl',  output  port  number 
two  is  connected  to  file  'out2',  and  output  port  number  three  is  connected  to 
file  *out3'. 

cmnd  >out1  2>out2  >out3 

Nodes  are  separated  by  node  separators  which  can  be  either  the  comma 
symbol  or  the  vertical  bar  symbol.  The  comma  symbol  is  used  to  separate  a 
node  that  does  not  have  any  output  ports  connected  to  any  other  nodes.  The 
vertical  bar  symbol  or  pipe  symbol  is  used  to  identify  the  connection  of  an 
output  port  of  the  node  immediately  preceding  the  pipe  symbol  and  the  input 
port  of  another  node.  The  port  numbers  and  logical  node  number  of  the  pipe 
specification  may  be  omitted  and  default  values  assumed.  If  a  port  number  is 
omitted,  the  next  unused  port  number  for  the  node  possessing  the  port  is  used. 
The  logical  node  number  of  the  pipe  specification  identifies  a  node  of  the 
logical  network.  It  may  either  be  an  integer  identifying  the  position  of  the 
node  on  the  command  line,  the  symbol  '$'  which  identifies  the  last  node  on  the 
command  line,  or  a  node  label.  If  no  other  node  is  specified,  the  node 
immediately  following  the  pipe  symbol  is  assumed  to  be  the  destination  of  the 
output  of  the  pipe. 

An  example  of  a  work  request  utilizing  this  syntax  is  shown  in  Figure  7- 
This  command  consists  of  seven  logical  nodes  connected  in  the  manner  depicted 
in  the  figure.  It  demonstrates  several  forms  of  pipe  specifications  including 
the  use  of  labels  in  identifying  nodes. 
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Work  Request: 

pgml  I  pgm2  lia  2|b  :a  pgm3  !  pgm4  lc.1  :b  pgm5  I  pgm6  !.2  :c  pgm7 
(0)  (1)  (2)  ( 3 )  (4)  (5)  (6)  (7)  (8)  (9) 

(0)  Output  port  1  of  pgml  is  connected  to  input  port  1  of  pgm2. 

(1)  Output  port  1  of  pgm2  is  connected  to  input  port  1  of  the 

logical  node  labeled  "a,"  pgm3» 

(2)  Output  port  2  of  pgm2  is  connected  to  input  port  1  of  the 
logical  node  labeled  "b,"  pgm5. 

(3)  Label  for  the  logical  node  containing  pgm3  as  its  execution 
module. 

(4)  Output  port  1  of  pgm3  is  connected  to  input  port  1  of  pgm4. 

(5)  Output  port  1  of  pgmJJ  is  connected  to  input  port  1  of  the 

logical  node  labeled  "c,"  pgo7. 

(6)  Label  for  the  logical  node  containing  pgm5  as  its  execution 
module. 

(7)  Output  port  1  of  pgm5  is  connected  to  input  port  1  of  pgm6. 

(8)  Output  port  1  of  pgm6  is  connected  to  input  port  2  of  pgm7> 

(9)  Label  for  the  logical  node  containing  pgm7  as  its  execution 
module. 


Data  Flow  Graph  of  the  Work  Request: 
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Figure  7.  Example  of  a  Work  Request 
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In  order  to  simulate  the  load  generated  by  users  entering  work  requests 
from  user  terminals,  a  population  of  work  requests  is  created.  The  form  of 
the  input  for  creating  the  work  request  population  is  provided  in  Figure  8. 
Each  line  of  input  contains  a  series  of  node  identifiers  followed  by  a  colon 
which  is  followed  by  a  work  request.  The  node  identifiers  indicate  which 
nodes  are  to  contain  the  given  work  request  as  a  member  of  the  node’s  popula¬ 
tion  of  work  requests.  Therefore,  the  result  of  this  input  is  the  construc¬ 
tion  of  a  population  of  work  requests  for  each  node.  In  a  subsequent 
paragraph,  the  nature  of  the  load  generator  is  discussed  and  indicates  how 
this  information  is  utilized. 


<work  request  population>  ::=  <work  request  entry> 


<work  request  entry> 

<work  request  entry>  ::=  {  <node  identifier >  }  :  <work  request> 
<node  identified  ::=  <integer> 

<work  request>  ::  =  (see  Figure  6) 

<integer>  ::=  <digit>  {  <digit>  } 

Examples: 

12345:  pgml  i  pgm2 
1  3  :  pgml 


{  the  work  request  'pgml  !  pgm2' 
is  available  on  nodes  1,  2,  3, 
4,  and  5  } 

{  the  work  request  ' pgml '  is 
available  on  nodes  1  and  3  } 


Figure  8.  Syntax  of  Work  Request  Population  Input  to  the  Simulator 


4.2. 4. 4  Command  Files 

Command  files  are  constructed  for  the  simulator  using  the  syntax 
desorlbed  in  Figure  9.  This  input  specifies  a  unique  name  for  the  file,  the 
simulated  node  at  which  the  file  resides,  and  the  oomaands  contained  in  the 
file.  These  commands  conform  to  the  syntax  of  work  requests  presented  in 
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Figure  6.  These  statements  provide  one  with  the  ability  of  constructing  com¬ 
mand  files  on  particular  nodes  which  are  referenced  either  by  commands 
originating  from  user  terminals  or  other  command  files. 


<command  file>  ::=  C  <node  id>  <command  file  name> 
{  <work  request >  } 

ENDC 

<node  id>  ::  =  <integer> 

<command  file  name>  ::=  <up  to  8  characters> 

<work  request>  ::=  (see  Figure  6) 

<integer>  ::=  <digit>  {  <digit>  } 


Examples: 

C  1  cfilel 

pgml  !  pgm 2  1  la  2 i b  :a  pgm3  !  Pgm4  jc.1  :b  pgm5  !  pgm6  1.2  :c  pgm7 

pgjnl  |  pgm5 

ENDC 


Figure  9.  Syntax  of  Command  File  Input  to  the  Simulator 


4.2. 4. 5  Object  Files 

Figure  10  depicts  the  syntax  used  to  express  object  files  in  the 
simulator.  The  input  specifies  a  unique  name  for  the  file,  the  simulated  node 
at  which  the  file  resides,  the  length  of  the  file  in  bytes,  and  the  simulation 
script.  The  script  contains  a  series  of  statements  that  describe  the  process 
actions  that  are  to  be  simulated.  There  are  five  actions  which  can  be 
simulated:  1)  compute,  2)  receive  a  message,  3)  send  a  message,  4)  loop  back 
to  a  previous  command  a  specific  number  of  times,  and  5)  terminate  the  process 
simulation.  By  appropriately  combining  these  commands,  one  can  construct  a 
script  which  simulates  the  activities  of  a  given  user  process. 

*•2.4.6  Data  Piles 

Data  files,  depicted  In  Figure  11,  are  the  final  type  of  file  which  can 
be  presented  to  the  simulator.  The  data  file  input  contains  an  identifying 
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<object  file>  ::=  0  <node  id>  <objeot  file  name>  <object  file  length> 
{  <action>  } 

ENDO 

<node  id>  =  <integer> 

<object  file  name>  <up  to  8  characters> 

<object  file  length>  ::  =  <integer> 

<action>  ::=  <eomp>  I  <loop>  |  <rcv>  |  <send>  |  <term> 

<comp>  ::=  c  <#  of  instructions> 

<loop>  ::=  1  instruction  #>  <count> 

<rcv>  ::=  r  <port> 

<send>  ::=  s  <port>  <size  (bytes)> 

<term>  ::=  t 

<#  of  instructions)* ,  <instruction  #>,  <count>,  <port>, 

<size>  <integer> 

<integer>  <digit>  {  <digit>  } 


Examples: 


0  1  objectl  1000 
c  25 
1  1  10 
r  2 

3  4  100 
t 

ENDO 


(object  file  is  1000  bytes  long) 

(simulate  25  computation  instructions) 

(loop  back  to  the  first  instruction  10  times) 
(read  a  message  from  port  2) 

(send  a  message  of  100  bytes  in  length  to  port  4) 
(terminate  the  execution  of  this  process) 


Figure  10.  Syntax  of  Object  File  Input  to  the  Simulator 


name,  a  node  identification  indicating  the  file’s  simulated  location,  and  a 
specification  of  the  file  size.  Data  is  not  actually  stored  by  the  simulator. 

4.2.5  The  Simulator  Design 

The  simulator  is  composed  of  several  modules.  In  each  module,  closely 
related  data  structures  and  the  procedures  that  modify  these  data  structures 
are  defined.  The  only  access  to  the  data  structure  Is  through  these 
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<data  fil'e>  ::=  D  <node  id>  <data  file  name>  <size> 

<node  id>  ::=  <integer> 

<data  file  name>  : : =  <up  to  8  charactera> 

<size>  ::=  <integer  ( bytes )> 

<integer>  <digit>  {  <digit>  } 

Examples : 

D  3  testfile  100000  (defines  a  data  file  named  'testfile' 

which  will  reside  on  node  3  and  will 
contain  100,000  bytes  of  information) 

Figure  11.  Syntax  of  Data  File  Input  to  the  Simulator 


procedures.  This  design  allows  one  to  isolate  the  portion  of  the  simulator 
that  represents  the  model  of  control  and  conduct  experiments  with  various 
perturbations  of  the  control  model.  Without  this  type  of  design,  each  pertur¬ 
bation  could  easily  require  significant  changes  to  the  entire  simulator.  The 
major  modules  of  the  simulator  are  described  below. 

4.2.5. 1  Node  Module 

The  NODE  MODULE  simulates  the  hardware  activities  of  each  node  (e.g. , 
the  processor  and  attached  disks).  This  includes  the  simulation  of  user 
activities  as  specified  by  process  scripts  and  the  simulation  of  disk  traffic. 
In  addition,  this  module  provides  the  local  operating  system  functions  of 
dispatching,  blocking  processes  for  message  transmission  or  reception,  and 
unblocking  processes. 

4.2.5 .2  Message  System 

All  activities  dealing  with  messages  are  handled  by  the  MESSAGE  SYSTEM. 
Among  the  services  provided  by  this  module  are  the  following:  1)  routing  of 
messages,  2)  placement  of  messages  in  LINK  QUEUES,  3)  transmission  of  messages 
across  a  link,  4)  transmission  of  acknowledgement  signals  to  the  source  end  of 
a  link,  and  5)  placement  of  messages  in  PORT  QUEUES. 
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4. 2. 5. 3  File  System 

The  FILE  SYSTEM  stores  the  various  types  of  files,  which  include  object, 
command,  and  data  files.  It  stores  the  scripts  for  object  files  and  provides 
access  to  the  scripts.  Similarly  for  command  files,  it  stores  the  work 
requests  for  each  command  file  and  controls  access  to  the  file.  It  maintains 
directories  that  provide  location  information  and  access  control  information. 
All  executive  control  actions  pertaining  to  the  file  system  are  contained  in 
this  module. 

4. 2. 5. 4  Command  Interpreter 

The  COMMAND  INTERPRETER  parses  work  requests  and  constructs  the  task 
graph  describing  the  initial  resource  requirements  for  a  work  request. 

4. 2. 5. 5  Task  Set  and  Process  Manager 

The  TASK  SET  AND  PROCESS  MANAGER  performs  all  control  activities 
required  to  manage  all  phases  of  execution  of  a  work  request.  This  includes 
activating  the  COMMAND  INTERPRETER;  communicating  with  the  FILE  SYSTEM  in 
order  to  gather  information,  allocate  files,  or  deallocate  files;  perform  work 
distribution  and  resource  allocation;  and  manage  active  processes. 

4. 2. 5. 6  Load  Generator 

Work  request  traffic  originating  from  the  user  terminals  attached  to 
each  node  is  created  by  the  LOAD  GENERATOR.  A  series  of  work  requests 
provided  by  a  user  at  a  terminal  is  called  a  user  session.  To  simulate  a  user 
session,  the  LOAD  GENERATOR  randomly  chooses  a  session  length  from  a  user 
specified  interval.  A  session  starting  time  (measured  in  seconds)  is  also 
chosen  at  random  from  a  user  specified  interval.  Each  work  request  for  the 
user  session  is  chosen  at  random  from  the  population  of  work  requests 
originally  created  for  each  node  via  the  input  statements  described  above  (see 
Figure  8).  The  LOAD  GENERATOR  also  simulates  the  "think  time"  between  work 
requests  by  randomly  choosing  a  time  (measured  in  seconds)  from  a  user 
specified  interval. 

4.2.6  Performance  Measurements 

Performance  measurements  are  made  concerning  three  types  of  data:  1) 
the  quantity  of  message  traffic,  2)  the  magnitudes  of  various  queue  lengths 
and  their  associated  waiting  times,  and  3)  the  size  of  average  work  request 
response  times  and  throughput. 
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To  identify  the  impact  of  the  executive  control  on  the  communication 
system,  various  communication  measurements  are  obtained.  A  cumulative  total 
of  the  number  of  user  messages  and  control  messages  over  the  entire  system  is 
maintained.  This  allows  one  to  compare  the  number  of  control  messages  to  the 
number  of  user  messages  and  thus  identify  how  the  communication  system  is 
being  utilized.  In  addition,  a  count,  again  categorized  by  user  messages  and 
control  messages,  is  maintained  in  matrix  form  to  identify  the  total  number  of 
messages  originating  at  a  particular  node  and  destined  for  every  other  node. 
Traffic  counts  on  each  communication  link  are  also  recorded  according  to  their 
classification  as  user  messages  or  control  messages.  Finally,  activity  in  the 
LINK  QUEUES,  where  messages  wait  to  be  transmitted  over  each  link,  is 
maintained.  These  measurements  include  minimum  queue  length,  maximum  queue 
length,  average  queue  length,  minimum  waiting  time  in  the  queue,  maximum  wait¬ 
ing  time,  and  average  waiting  time. 

In  addition  to  measurements  concerned  with  the  LINK  QUEUES,  a  similar 
analysis  of  process  queues  is  performed.  The  queues  on  each  node  that  are 
analyzed  are  the  READY  QUEUE  (processes  waiting  for  access  to  the  CPU),  MES¬ 
SAGE  BLOCKED  QUEUE  (processes  that  are  either  waiting  to  place  a  message  in  a 
LINK  QUEUE  or  processes  waiting  to  receive  a  message),  and  DISK  WAITING  QUEUES 
(processes  waiting  for  access  to  a  particular  disk).  The  types  of 
measurements  obtained  are  identical  to  those  for  the  LINK  QUEUES. 

To  identify  the  effectiveness  of  the  control  strategy,  measurements  are 
obtained  that  identify  how  effectively  user  processing  is  accomplished.  For 
each  node  and  cumulatively  for  all  nodes,  the  following  measurements  are 
obtained  for  user  sessions,  work  requests,  and  processes: 

1.  The  total  number  of  user  sessions,  work  requests,  and  proces¬ 
ses. 

2.  The  average  service  time  for  each  user  session,  work  request, 
and  process. 

3.  The  average  response  time  for  each  user  session,  work  request, 
and  process. 

The  throughput  for  user  sessions,  work  requests,  and  processes. 
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SECTION  5 

THE  SIMULATION  EXPERIMENTS 

In  the  second  phase  of  experimentation  two  groups  of  simulation 
experiments  designed  to  measure  the  performance  of  the  various  models  in  an 
FDPS  environment  are  conducted.  In  addition,  a  number  of  experiments  are  con¬ 
ducted  with  a  single  node  network.  In  the  first  group  of  FDPS  experiments, 
only  one  work  request  is  processed  by  the  entire  network.  The  intent  of  this 
set  of  experiments  is  to  determine  the  minimum  delay  experienced  by  a  work 
request  with  each  model  of  control.  In  the  second  group  of  experiments,  a 
load  is  placed  on  all  nodes.  These  studies  are  designed  to  examine  the 
behavior  of  the  various  models  of  decentralized  control  operating  in  a  produc¬ 
tion  mode  with  various  physical  interconnection  topologies.  The  single  node 
experiments  provide  a  means  of  comparing  the  performance  of  an  FDPS  to  that  of 
isolated  uniprocessors. 

5.1  THE  SIMULATION  ENVIRONMENTS 

The  environment  in  all  FDPS  experiments  consists  of  a  network  of  five 
nodes  interconnected  in  various  ways  providing  five  different  interconnection 
topologies:  1)  a  unidirectional  ring,  2)  a  bidirectional  ring,  3)  a  star,  *0 

a  fully  connected  network,  and  5)  a  tree.  (See  Figure  12.)  The  nodes  of  each 
network  (see  Figure  2)  are  all  homogeneous,  and  each  consists  of  a  processor 
capable  of  executing  one  million  instructions  per  second.  Connected  to  each 
node  are  ten  user  terminals  and  three  disk  drives.  The  disks  are  assumed 
identical,  each  with  an  average  latency  of  100  microseconds  and  a  transfer 
rate  of  500,000  bytes  per  second. 

5.1.1  Environmental  Variables 

In  addition  to  different  topologies,  the  bandwidth  of  the  communication 
links  and  the  model  of  control  are  also  varied  for  the  experiments.  Table  3 
provides  a  brief  comparison  of  the  various  models.  Only  the  first  four  models 
of  control  (XFDPS.1,  XFDPS.2,  XFDPS.3,  and  XFDPS.il)  are  utilized  in  these 
initial  experiments.  Models  XFDPS.5  and  XFDPS.6  differ  from  model  XFDPS.1  in 
details  that  are  not  examined  in  these  experiments.  Therefore,  they  are  not 
considered  in  these  experiments  because  their  observable  results  will  be 
identical  to  those  of  XFDPS.1.  It  is  instructive,  though,  to  note  that  not 
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all  model  variations  result  in  performance  differences.  Finally,  it  should  tc 
noted  that  the  central  directory  of  model  XFDPS.2  is  maintained  on  node  1  in 
all  experiments. 

5.1.2  Environmental  Constants 

Several  environmental  features  remain  constant  for  all  experiments.  In 
all  cases,  it  is  assumed  that  all  control  messages  are  50  bytes  long.  All 
control  models  utilize  the  same  policy  for  distributing  work  and  allocating 
resources.  This  policy  simply  requires  all  processes  to  execute  on  the  node 
where  the  object  code  for  that  process  resides.  There  is  only  one  copy  of  the 
object  code  for  each  process  in  the  network  for  these  initial  experiments. 
The  work  distribution  and  resource  allocation  policy  utilized  for  these  tests 
requires  that  data  files  be  accessed  at  the  location  where  they  originally 
reside  and  not  be  moved  prior  to  execution.  In  every  experiment,  all  files 
are  unique  thus  leaving  the  control  with  only  one  resource  allocation  alter¬ 
native. 

The  work  requests  arriving  at  all  nodes  are  of  the  type  '  in>  cmnd'.  The 
data  file  'in'  provides  input  to  the  process  resulting  from  the  loading  of  the 
object  file  'cmnd'.  This  provides  an  environment  in  which  files  are  accessed 
only  by  means  of  reads  thus  eliminating  the  possibility  that  certain  work 
requests  are  either  delayed  or  aborted  due  to  insufficient  resources. 
Therefore,  it  is  guaranteed  that  all  control  activity  results  in  the  success¬ 
ful  completion  of  a  work  request. 

In  all  cases,  the  object  file  ’cmnd'  and  data  file  ’in'  are  located  on 
the  same  node.  This  means  that  all  file  accesses  are  local  file  accesses  and 
thus  control  message  traffic  is  free  of  competition  by  user  messages  for  com¬ 
munication  resources.  This  provides  an  environment  in  which  the  effects  of 
the  control  models  can  be  more  directly  observed  without  the  influence  of  an 
unpredictable  collection  of  user  messages. 

The  object  files  in  each  case  specify  the  execution  of  the  same  script 
which  is  depicted  in  Figure  13.  This  script  describes  a  process  that  alter¬ 
nately  computes  and  reads  from  a  data  file  for  501  iterations.  Given  the 
speed  of  the  processors  utilized  in  the  experiments,  this  results  in  a  CPU 
utilization  of  approximately  5  seconds  for  each  process. 
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Table  3.  Comparison  of  the  Control  Models 
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c  10000  {  10,000  compute  instructions  } 

r  1  {  read  from  port  1  } 

1  1  500  {  loop  back  to  instruction  one  500  times  } 

t  {  terminate  the  process  } 


Figure  13.  The  Script  Utilized  By  All  Processes 


5.2  GROUP  1  EXPERIMENTS 

5.2.1  The  Environment 

The  first  group  of  experiments  is  designed  to  demonstrate  the  minimum 
delay  experienced  by  a  single  work  request  as  a  result  of  utilizing  each  model 
of  control.  In  this  set  of  experiments,  all  topologies  are  investigated  in 
addition  to  various  bandwidths  ranging  from  1200  to  500,000  bytes  per  second. 
These  experiments  examine  situations  in  which  work  requests  arrive  at  both 
nodes  1  and  2.  In  addition,  the  location  of  the  object-data  file  pairs  named 
in  the  work  request  are  varied  over  all  five  nodes. 

Each  of  these  tests  requires  the  simulator  to  process  only  one  work 
request,  thus  eliminating  competition  for  resources  by  other  work  requests. 
The  work  request  response  times  for  each  environment  (model,  topology,  band¬ 
width,  and  location  of  object-data  file  pair)  are  provided  in  Appendix  2.1. 

5.2.2  Qbaeryatlona 

A  comparison  of  the  results  of  this  set  of  experiments  can  be  seen  in 
Figures  14  and  15.  In  Figure  14,  the  results  of  work  requests  arriving  at 
node  1  can  be  seen.  Node  1  is  chosen  in  order  to  demonstrate  how  XFDPS.2  (the 
model  with  a  centralized  file  system  directory  located  on  node  1)  can  benefit 
from  the  location  of  a  work  request.  In  all  cases,  model  XFDPS.2  provides  the 
smallest  response  times.  When  the  work  request  arrives  at  another  node  (e.g., 
node  2)  XFDPS.2  no  longer  provides  the  minimum  response  time  in  all  cases. 

The  sensitivity  of  XFDPS.2  to  the  location  of  the  work  request  can  be 
attributed  to  the  location  of  the  central  file  system  directory  on  node  1.  If 
a  work  request  arrives  at  node  1 ,  all  resource  allocation  can  be  performed 
without  requiring  the  transmission  of  any  control  messages.  The  only  control 
messages  needed  are  those  necessary  to  activate  the  file  processes  for  each 
file  named  in  the  work  request.  These  messages  are  transmitted  once  the  files 
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UNIDIRECTIONAL  RING 
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Figure  14.  Comparison  of  the  Response  Times  for  Models  1,  2,  3»  and  4 
that  Were  Obtained  from  the  Group  1  Experiments  in  Which 
Work  Requests  Arrived  at  Node  1 
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UNIDIRECTIONAL  RING 
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Notation:  i  >  j  means  response  time  using  model  i  is  greater  than  that  using  J 
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Figure  15.  Comparison  of  the  Response  Times  for  Models  1,  2,  3>  and  4 
that  Were  Obtained  from  the  Group  1  Experiments  in  Which 
Work  Requests  Arrived  at  Node  2 
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have  been  allocated.  If  the  work  request  arrives  at  node  2,  a  message  must  be 
sent  to  node  1  in  order  to  allocate  the  resources.  Once  the  resources  have 
been  allocated,  the  messages  to  activate  file  processes  can  be  sent. 
Therefore  a  two  stage  operation  with  two  sets  of  messages  results  from  this 
scenario. 

XFDPS. 1  and  XFDPS.3  provide  an  alternate  strategy  which  explains  their 
superior  performance  to  XFDPS. 2  when  the  work  request  arrives  at  node  2.  In 
these  models,  file  allocation  and  file  process  activation  are  accomplished 
with  one  message  because  the  directory  for  a  file  and  the  file  itself  reside 
on  the  same  node.  Therefore,  once  a  file  has  been  allocated,  the  file  process 
can  be  activated  with  an  intranode  operation. 

In  all  but  two  cases,  XFDPS. 4  results  in  the  largest  response  time  of 
all  the  models.  Only  when  the  work  request  arrives  at  node  2  in  a  network 
consisting  of  a  unidirectional  ring  with  a  bandwidth  of  1200  bytes  per  second 
does  this  model  perform  better  than  the  other  models.  This  particular 
topology  provides  the  longest  paths  between  nodes  thus  making  it  quite  suscep- 
table  to  communication  problems.  Model  XFDPS. M  performs  better  at  low  band- 
widths  than  the  other  models  for  this  particular  topology  because  only  one 
message  is  present  on  the  communication  net  once  a  work  request  is  being 
processed.  During  the  resource  allocation  phase,  the  update  vector  (UPV)  cir¬ 
culates  about  the  ring;  and,  after  this  step,  the  control  vector  (CV)  is 
present  on  the  ring.  In  all  other  models,  multiple  messages  are  utilized  to 
process  a  work  request;  thus,  at  low  bandwidths,  message  throughput  becomes  a 
problem. 

Finally,  the  outstanding  performance  of  XFDPS.3  when  the  object  and  data 
files  named  in  a  work  request  reside  on  the  same  node  as  the  work  request 
should  be  noted.  This  is  a  clear  demonstration  of  the  savings  possible  with 
this  policy.  One  should  also  note  that  the  performance  of  XFDPS. 1  and  XFDPS.3 
are  identical  when  the  named  files  are  on  a  node  different  than  the  one 
receiving  the  work  request. 

5.3  GROUP  2  EXPERIMENTS 

The  first  set  of  experiments  demonstrates  fundamental  differences  in  the 
performance  of  the  models  when  handling  individual  work  requests,  but  this 
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type  of  experiment  can  often  be  deceiving.  When  multiple  work  requests  are 
processed  concurrently,  the  simultaneous  demands  on  resources  can  result  in 
unexpected  delays  which  cannot  be  anticipated  with  the  data  obtained  from  the 
first  set  of  experiments. 

5.3.1  Iha  Environment 

The  goal  of  this  set  of  experiments  is  to  simulate  and  examine  a  produc¬ 
tion  environment.  It  would  be  desirable  to  establish  identical  loads  for  all 
experiments,  but  the  nature  of  the  problem  makes  this  impossible.  The  basic 
environment  consists  of  a  network  of  five  nodes  with  ten  user  terminals 
attached  to  each  node.  To  provide  an  identical  load,  one  would  have  to 
guarantee  that  the  work  requests  will  be  presented  to  the  simulator  in  the 
same  order  for  each  experiment.  The  control  models,  though,  are  composed  of 
autonomous  components  and  by  their  design  will  process  work  requests  on  each 
node  at  different  rates  as  demonstrated  by  the  results  of  the  group  1 
experiments.  This  implies  that  even  if  the  work  requests  at  each  node  are 
presented  in  the  same  order,  the  load  provided  to  the  simulator  will  be 
different  because  the  timing  of  work  request  arrivals  may  vary. 

To  clarify  this  point,  consider  the  following  example.  Assume  the  loads 
provided  to  nodes  1  and  2  are  as  shown  in  Figure  16.  This  figure  depicts  the 
order  in  which  the  work  requests  arrive  at  each  node.  Because  the  control 
models  process  work  requests  at  different  rates,  different  processing 
sequences  are  obtained  for  the  control  models.  Figure  17  depicts  the  sequence 
for  model  1  and  Figure  18  depicts  that  for  model  2.  Thus,  although  the  loads 
at  each  node  are  controlled,  it  is  impossible  to  control  the  sequence  of  work 
requests  on  all  nodes  collectively. 


Load  at  Node  1 


Load  at  Jteda  2 


WR1 

WR2 

WR3 

WRU 


WR5 

WR6 

WR7 

WR8 


Figure  16.  Example  of  Loads  Presented  to  Two  Nodes 
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Node  1  WR1  WR2  WR3  WRU 

Node  2  WR5  WR6  WR7  WR8 

Time - - — - > 


Figure  17.  Sequence  of  Work  Request  Arrivals  When  Using  Model  1 


Node  1  WR1  WR2  WR3  WR4 

Node  2  WR5  WR6  WR7  WR8 

Time - > 


Figure  18.  Sequence  of  Work  Request  Arrivals  When  Using  Model  2 


Since  identical  loads  cannot  be  provided,  we  attempt  to  construct  an 
unbiased  load.  Each  terminal  issues  its  first  work  request  at  a  time  measured 
in  seconds  corresponding  to  an  integral  value  chosen  at  random  from  the  inter¬ 
val  [1,  15],  After  a  work  request  has  completed,  the  arrival  time  (measured 
in  seconds)  of  the  next  work  request  from  the  terminal  is  again  chosen  by 
selecting  a  random  value  in  the  interval  [1,  15]  as  the  delay  from  the 
termination  of  the  previous  work  request.  The  work  requests  are  chosen  at 
random  from  a  common  pool  of  work  requests.  Each  work  request  in  the  pool  is 
of  the  type  described  earlier  in  section  5.1.2  naming  object-data  file  pairs 
in  which  both  the  object  file  and  data  file  reside  on  the  same  node.  There  is 
an  equal  number  of  object-data  file  pairs  on  each  node.  Therefore,  the 
probability  that  a  newly  arrived  work  request  names  an  object-data  file  pair 
residing  on  node  i  is  1/5  for  i  =  1,  5. 

In  order  to  obtain  steady  state  data,  the  taking  of  measurements  is 
delayed  until  a  simulation  time  of  30  seconds  after  the  start  of  the  test. 
This  insures  that  all  terminals  are  active  and  are  into  their  normal 
activities.  Measurements  are  then  taken  until  330  seconds  into  the  simulation 
thus  providing  a  measurement  interval  of  5  minutes.  This  provides  observation 
of  the  processing  of  over  200  work  requests.  Longer  simulation  intervals, 
though  desirable,  are  not  practical  due  to  the  extensive  computation  necessary 
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to  simulate  the  level  of  detail  provided  by  the  control  models  being  examined. 
It  has  been  observed  for  most  runs  that  over  three  hours  of  computing  time  on 
a  Prime  550  are  required.  (The  performance  of  the  Prime  550  is  approximately 
80$  of  that  of  an  IBM  370/158  and  35$  of  that  of  a  VAX  11/780  [Henk8l].)  Over 
160  simulation  runs  have  been  made  during  the  process  of  this  research. 

In  this  set  of  experiments,  the  following  three  factors  are  varied:  1) 
control  model,  2)  topology,  and  3)  bandwidth.  Experiments  utilizing  all  pos¬ 
sible  combinations  of  these  factors  are  run.  The  results  of  these  experiments 
are  provided  in  Appendix  2.2. 

5.3*2  Observations 

The  most  distinguishing  feature  of  the  results  of  these  tests  is  the 
lack  of  significant  variation  in  average  response  time  for  experiments  utiliz¬ 
ing  all  models  and  topologies  with  bandwidths  1200  bytes  per  second  or  larger. 
In  all  cases,  the  LINK  QUEUES  have  an  average  length  of  between  one  and  two 
messages,  implying  that  the  communication  system  does  not  prove  to  be  a  bott¬ 
leneck. 

To  demonstrate  that  the  values  for  average  response  time  could  be 
explained  by  delays  due  to  the  intranode  multitasking  of  processes, 
experiments  utilizing  the  extremely  high  bandwidth  of  2.5  million  bytes  per 
second  are  conducted.  The  results  are  very  similar  to  those  obtained  with 
much  lower  transmission  rates.  In  addition,  a  simulation  of  a  single  node 
network  is  conducted.  This  also  results  in  an  average  response  time  not 
significantly  different.  (The  results  of  the  single  node  simulation  are 
provided  in  Appendix  2.3.) 

In  moat  cases  when  the  bandwidth  is  lowered  to  values  below  600  bytes 
per  second,  a  statistically  significant  increase  in  response  times  is  obser¬ 
ved.  In  most  cases,  either  XFDPS.2  or  XFDPS.il  provided  the  smallest  average 
response  time  values.  It  is  necessary,  though,  to  reduce  the  bandwidth  to 
extremely  low  values  in  order  to  observe  these  differences,  thus  leading  us  to 
conclude  that  as  far  as  constrasting  the  various  models  is  concerned,  the  data 
is  rather  inconclusive. 

Finally,  the  results  of  the  experiments  with  model  XFDPS.2  provide  one 
further  observation.  Recall  that  in  this  model  a  single  centralized  file 
system  directory  is  maintained.  All  file  system  requests  are  handled  by  the 
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node  housing  this  directory.  Therefore,  one  would  expect  the  performance  of 
this  node  to  be  somewhat  degraded  due  to  the  control  activity  required  to 
satisfy  the  file  system  requests.  The  results,  though,  show  that  this  is  not 
the  case.  The  average  response  times  for  work  requests  arriving  at  the  node 
where  the  central  directory  is  maintained  (node  1 )  do  not  differ  significantly 
from  those  on  other  nodes.  This  result  implies  that  the  amount  of  file  system 
management  work  is  rather  negligible,  thus,  it  does  not  lead  to  any  per¬ 
formance  degradation. 

5.4  .SIMPLE  rn  SKMQB&  experiments 

5.4.1  The  Environment 

This  set  of  experiments  is  considered  separately  from  those  described 
above  because  its  purpose  is  not  to  analyze  the  relative  performance  of  the 
control  models.  These  experiments  are  designed  to  provide  a  standard  upon 
which  the  other  results  can  be  compared  in  order  to  determine  the  impact  of 
distributed  processing  on  average  response  time  for  work  requests. 

The  configuration  of  the  single  node  comprising  the  network  in  this  set 
of  experiments  is  identical  to  that  for  each  node  in  the  other  experiments. 
The  work  requests  name  object-data  file  pairs  and  the  script  for  the  object 
file  is  the  same  as  that  employed  in  the  first  two  groups  of  experiments. 
Since  there  is  no  internode  communication,  the  choice  of  the  control  model  is 
of  no  consequence,  and  therefore  XFDPS.1  is  arbitrarily  selected. 

5.4.2  Observations 

Five  simulations  are  conducted  and  the  results  of  those  runs  are 
presented  in  Appendix  2.3.  The  values  for  average  response  time  from  these 
experiments  are  similar  to  those  found  in  the  first  group  of  experiments  when 
bandwidths  greater  than  600  bytes/sec  are  used. 
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SECTION  6 
CONCLUSIONS 

6.1  QUALITATIVE  ASPECTS  OF  THE  MODELS 

The  evaluation  of  the  control  models  would  be  incomplete  if  considera¬ 
tion  were  given  only  to  the  quantitative  results  provided  by  the  simulation 
experiments.  It  is  also  important  to  examine  certain  qualitative  aspects  of 
the  models  which  were  not  quantitatively  evaluated.  These  aspects  include  the 
ability  to  provide  fault-tolerant  operation  (e.g.,  graceful  degradation  and 
restoration),  the  ability  for  the  system  to  expand  gracefully,  and  the  ability 
to  balance  the  system  load. 

6.1.1  XFDPS.1 

The  XFDPS.1  model  is  a  truly  distributed  and  decentralized  model  of 
control.  In  this  model,  resources  are  partitioned  along  node  boundaries  and 
managed  by  components  residing  on  the  same  node  as  the  resource.  This  design 
enables  the  system  to  remain  in  operation  in  the  presence  of  a  failure.  In 
such  a  situation,  those  nodes  not  available  are  simply  not  contacted  when 
queries  concerning  resources  are  made.  The  failed  nodes  are  also  not 
considered  as  locations  for  the  execution  of  tasks  during  the  formulation  of 
the  work  distribution  and  resource  allocation  decision. 

This  model  of  control  requires  some  activity  on  the  part  of  all  nodes  in 
order  to  satisfy  each  work  request.  There  is  no  single  node  that  is  by  design 
supposed  to  receive  any  more  activity  than  any  other  node;  instead,  the  work 
is  spread  across  all  nodes.  In  addition,  global  information  for  the  work 
distribution  and  resource  allocation  decision  is  obtained  for  each  work 
request  as  it  is  processed.  This  global  data  enables  the  control  to  better 
balance  the  load  across  the  network. 

This  control  model  is  not  without  its  problems.  The  global  searches  for 
resources  that  occur  for  every  work  request  may  be  unnecessary  (e.g.,  in  those 
instances  in  which  only  local  resources  are  required).  Short  local  jobs 
therefore  suffer  at  the  expense  of  the  longer  jobs  utilizing  non-local  resour¬ 


ces. 


Page  56 


CONCLUSIONS 


Section  6 


6.1.2  XFDPS.2 

XFDPS.2  utilizes  a  single  centralized  file  system  directory.  On  the 
surface,  this  model  appears  to  be  simple  to  implement.  A  central  directory  is 
maintained,  and  all  file  system  queries  are  sent  to  the  node  housing  that 
directory.  However,  problems  result  when  fault- tolerant  operation  is  desired. 
No  longer  can  a  single  central  directory  be  maintained  because  the  loss  of  the 
node  housing  the  directory  would  be  catastrophic.  Alternative  strategies 
which  provide  for  fault-tolerant  operation  (see  for  example  Garcia-Molina' s 
technique  described  in  [Garc79]  for  providing  fault  tolerance  in  a  centralized 
locking  distributed  data  base  system)  significantly  complicate  the  design  of 
the  control  as  well  as  require  a  significant  expenditure  of  resources  in  order 
to  recover  from  a  failure.  It  should  be  noted  that  the  simulation  of  XFDPS.2 
does  not  account  for  the  overhead  required  to  provide  fault- tolerant 
operation.  Therefore,  the  average  work  request  response  times  observed  in  the 
experiments  are  lower  than  would  be  expected  if  the  necessary  control  features 
for  providing  fault- tolerant  operation  were  present. 

Model  XFPDS.2  also  has  problems  with  growth.  When  a  new  node  is 
introduced  into  the  system,  a  large  amount  of  work  is  required  to  update  the 
central  directory  to  add  the  resources  of  the  new  node.  This  factor  can  be 
quantified  and  will  be  the  subject  of  future  experiments. 

6.1.3  XFDPS.3 

The  XFDPS.3  model  is  similar  to  XFDPS.1.  It  differs  in  its  policy  for 
obtaining  file  availability  information.  First  a  local  search  is  made.  If 
all  resources  are  found,  they  are  utilized;  otherwise,  a  global  search  for 
resources  is  conducted.  As  described  in  Section  5,  this  model  provides  faster 
response  to  work  requests  utilizing  only  local  resources  as  expected.  Due  to 
its  information  gathering  policy,  the  potential  for  utilizing  distant  resour¬ 
ces  in  order  to  balance  the  load  is  sacrificed  because  resource  availability 
on  other  nodes  may  never  be  considered. 

6.1.4  IFDPS.4 

XFDPS.4  utilizes  redundant  copies  of  the  file  system  directory  on  all 
nodes.  Access  to  the  directory  is  restricted  to  the  node  possessing  the 
control  vector  that  is  passed  among  the  nodes  of  the  network.  This  model 
tends  to  work  somewhat  like  a  batch  system  by  delaying  file  system  requests 
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until  the  control  vector  (CV)  is  received  and  processing  these  requests  as  a 
batch. 

The  presence  of  the  replicated  file  directory  implies  that  there  is  both 
duplication  of  information  storage  and  duplication  of  effort  as  consistency  i3 
maintained  across  the  replicated  copies.  Since  file  system  requests  are 
delayed  until  the  CV  arrives,  jobs  with  very  short  service  times  may 
experience  unusually  large  response  times.  Finally,  as  with  XFDPS.2,  the 
introduction  of  a  new  node  requires  a  large  amount  of  work  in  order  to  update 
the  replicated  directories. 

6.1.5  XFDPS.5 

XFDPS.5  is  nearly  identical  to  XFDPS.1,  differing  only  in  its  policy  of 
not  locking  or  in  any  way  reserving  resources  prior  to  the  formulation  of  a 
work  distribution  and  resource  allocation  decision.  With  this  policy,  resour¬ 
ces  are  not  expected  to  be  needlessly  tied  up  in  most  cases.  A  problem  does 
exist  if  the  chosen  resources  cannot  be  locked  once  selected  for  allocation. 
In  this  case,  a  new  resource  allocation  decision  must  be  made  and  already 
allocated  and  locked  resources  may  need  to  be  released. 

6.1.6  XFDPS.6 

XFDPS.6  differs  from  XFDPS.1  in  the  manner  in  which  the  task  graph  and 
task  activation  are  handled.  In  this  model,  the  tasks  of  a  work  request  that 
are  chosen  to  execute  on  the  same  node  are  presented  to  the  PROCESS  MANAGER  of 
the  selected  node  collectively.  A  task  graph  identifying  this  collection  of 
tasks  is  constructed  and  task  activation  and  termination  are  handled  by  the 
PROCESS  MANAGER.  Thus,  the  TASK  SET  MANAGER  need  send  only  one  message  to 
each  of  the  nodes  utilized  by  the  work  request  in  order  to  activate  all  tasks. 
In  addition,  only  one  termination  message  is  received  from  each  node.  Further 
savings  are  provided  because  the  PROCESS  MANAGER  on  the  node  where  the  tasks 
are  executing  can  immediately  release  the  resources  utilized  by  the  tasks  as 
each  task  terminates. 

6.2  CQHCLPSIQKS 

One  must  remember  when  analyzing  the  results  in  Appendix  2  that  only 
control  message  traffic  is  present  during  these  simulation  experiments.  The 
simulation  experiments  may  be  inconclusive  in  establishing  the  relative  merits 
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of  the  various  models.  They  do,  though,  demonstrate  the  utility  of  the  fully 
distributed  processing  concept.  Even  networks  with  communication  links  pos¬ 
sessing  low  bandwidths  appear  to  be  feasible  candidates  for  fully  distributed 
processing  if  the  message  traffic  is  held  mainly  to  control  messages.  In 
particular,  the  experiment  with  the  single  node  network  leads  one  to  expect 
that  there  will  be  little  or  no  performance  loss  experienced  with  an  FDPS. 

One  of  the  most  important  results  of  this  research  is  the  production  of 
a  simulator  for  the  analysis  of  fully  distributed  processing  systems.  The 
experience  gained  from  the  simulator  has  been  the  basis  for  the  proposal  of 
several  interesting  experiments  to  be  conducted  in  the  future. 
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FUTURE  EXPERIMENTS 

This  work  has  suggested  several  future  experiments.  First,  networks  of 
increasing  numbers  of  nodes,  possibly  10,  15,  and  20  node  networks,  will  be 
investigated  to  determine  at  what  point  the  utility  of  the  various  models  is 
lost •  In  addition,  experiments  with  both  user  message  traffic  and  control 
message  traffic  will  be  investigated  in  order  to  observe  the  sensitivity  of 
the  various  models  in  the  presence  of  a  busy  communication  system.  Different 
resource  allocation  and  work  distribution  algorithms  will  be  instrumented  into 
the  simulator  in  order  to  determine  under  what  conditions  each  algorithm  is 
appropriate. 

The  issue  of  the  dynamic  addition  and  deletion  of  resources  will  also  be 
examined.  This  will  demonstrate  how  gracefully  the  various  models  can  adapt 
to  a  growing  system.  These  experiments  will  also  examine  the  fault- tolerant 
capabilities  of  the  various  models. 
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CONTROL  MODEL  PSEUDO  CODE 


1.1  PSEUDO  CODE  FOR  THE  XFDPS.1  CONTROL  MODEL 

1.1.1  Smew  Initiator. 


1:  process  system_initiator ; 

2:  {  Every  node  possesses  one  of  these  processes.  This  process 

3:  initiates  a  node  in  the  network  by  assigning  '  tas^setjnanager' 

4:  processes  to  each  connected  user  terminal,  activating  the 

5:  ' f ile_systenL_manager '  process,  and  activating  the 

6:  ' processor_utilization_manager'  process.  } 

7: 

8:  begin 

9:  for  every  attached  user  terminal  i  (fa 

10:  task_set_manager  (TERMINAL,  i); 

11 :  end for; 

12:  f ile_system_®anaSer ! 

13  •"  processor_utilization_manager ; 

14:  end  system_initiator ; 

1.1.2  lask  Manager 
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process  task_set_manager  ( case  input_origin:  inp_orig  of 

TERMINAL:  (term:  terminal_address) ; 
CMNDFILE:  (fd:  filedescriptor ) 
end) ; 

{  Every  terminal  and  every  executing  command  file  are  assigned 
a  * task_set_manager'  process.  When  a  process  of  this  type 
is  activated,  one  of  two  sets  of  parameters  is  passed  to  it 
depending  upon  the  source  of  input  to  the  process.  If  the 
process  is  assigned  to  handle  input  from  a  terminal,  the 
address  of  the  terminal  is  provided.  If  the  process  is 
assigned  to  handle  input  from  a  command  file,  the  file 
descriptor  for  the  command  file  is  provided.  } 

var 

tg:  task  graph  pointer: 
command_J.ine:  string; 
msg:  message_j>ointer ; 

begin 

while  <either  the  terminal  is  attached  or  the  end 
of  the  file  has  not  been  reached>  do 

<get  the  next  work  request  and  store  it  in  command_line>; 
new  (tg); 

parse  (command_line,  tg); 

<send  a  message  of  type  Ml  (file  availability  request)  to 
the  file_systenj_jnanager  on  this  node  that  contains  the 
names  of  files  need  for  this  work  request>; 
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<send  a  message  of  type  M2  (processor  utilization  request) 
to  the  processor_utilizatior\_manager  on  this  node>; 
<wait  for  a  message  from  processor_utilizatioii_manager>; 
<store  processor  utilization  information  in  tg*>; 

<wait  for  a  message  from  file_systenL_manager>; 

<store  file  availability  information  in  tg*>; 
if  work_distributor_and_resource_allocator  (tg)  =  ERR  then 
{  work  distribution  and  resource  allocation 
decision  could  not  be  made  } 

Creport  error>; 

if  input_origin  =  CMNDFILE  then 
exit  {  leave  the  loop  } 
else 

next  {  next  iteration  of  loop  } 
endif : 

£&&£; 

<send  a  message  of  type  M3  (file  lock  and  release  request) 
to  the  f ile_system_manager  on  this  node>; 

<wait  for  a  message  from  file_system_manager>; 
if  <all  locks  could  not  be  applied>  then 
<report  error>; 

<send  a  message  of  type  M4  (file  release  request) 
to  the  f ile_system_manager  on  this  node>; 
if  input_origin  =  CMNNDFILE  then 
exit  {  leave  the  loop  } 
else 

next  {  next  iteration  of  loop  } 
endif; 

for  <all  files  chosen  to  be  copied  before  execution>  do 
<send  a  message  of  type  M5  (file  copy  request)  to  the 
file_systern_pianager  on  this  node>; 
if  <files  need  copying>  then 

<wait  for  a  message  from  the  f il  e_sy stem_manager > ; 
endif ; 

for  <each  node  i  chosen  to  execute  parts  of  the 
work  request >  ji& 

<send  a  message  of  type  M6  (process  activation  request) 
to  the  process_manager  on  node  i>; 

mtfor; 

■rARfiat. 

<wait  for  a  termination  message  from  a  process_manager 
or  a  request  to  terminate  the  command  file  from 
the  process_jnanager  that  activated  this 
task_set_jnanager  > ; 

if  <this  is  a  termination  message  from  a 
process_jnanager>  then 

<mark  the  terminated  task  as  completed  in  tg*>; 

<send  a  message  of  type  M4  (file  release  request) 
to  the  file_system_jnanager  on  this  node>; 
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if  <the  termination  status  indicated  that  the 
process  terminated  due  to  an  error>  then 
for  <each  node  i  still  running  parts  of  this 
work  request >  da 

<send  a  message  of  type  M7  (process  kill  request) 
to  the  process_jnanager  on  node  i>; 
end for: 
end if; 
else 

for  <every  task  of  the  work  request >  da 
if.  <the  task  has  not  completed>  then 

<send  a  message  of  type  M7  (process  kill  request) 
to  the  process_manager  responsible  for 
the  task>; 

ejidif : 
endfor; 

break:  {  exit  the  loop  } 
endif: 

until  <all  tasks  have  terminated>; 
endwhile ; 

end  task_set_manager ; 


1.1.3  File  System  Manager 
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Process  file_system_manager ; 

{  Every  node  possesses  one  of  these  processes.  This  process 
satisfies  various  requests  concerning  the  file  system. 

This  is  accomplished  by  communicating  with  the  file_set_managers 
on  all  nodes.  } 

msg:  message_pointer ; 

favptr :  file_availability_rec_pointer; 

f lrprt :  f il e_lock_and_release_rec_pointer ; 

begin 

Iftag 

<wait  for  a  message  of  any  type  (let  msg  point  to 
the  message) >; 

-Case  msg“.message_type  af 

Ml :  {  file  availability  information  request  } 

bsgin 

new  (favptr); 

<insert  the  record  favptr  points  to  into  the 
list  of  fav_recs>; 

<record  the  names  of  the  files  identified  in  msgA>; 
fop  <each  node  i>  da 

<send  a  message  of  type  M8  (file  availability 
request)  to  the  file_set_jnanager  on  node  i 
that  contains  the  names  of  all  files>; 
aacLfar ; 
aad; 
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M3:  {  file  lock  and  release  request  } 
begin 

new  (flrptr); 

<insert  the  record  flrptr  points  to  into  the 
list  of  flr_recs>; 
for  <each  node  i>  da 

<send  a  message  of  type  M9  (file  lock  and 
release  request)  to  the  file_set_manager 
on  node  i  that  contains  the  names  of  all 
files  from  msg*  that  are  identified 
as  being  located  at  node  i>; 
endfor; 
and; 

M4:  {  file  release  request  } 
begin 

for  <each  node  i>  do 

<send  a  message  of  type  M10  (file  release 
request)  to  the  f ile_set_manager  on 
node  i  that  contains  the  names  of  all 
files  from  m3g*  that  are  identified  as 
being  located  at  node  i>; 
endfor: 

and; 

M5:  (  file  copy  request  } 
begin 

new  (fmvptr); 

<insert  the  record  fmvptr  points  to  into  the  list 
of  fmv_recs>; 

for  <each  file  named  in  msg*>  dft 

<insert  the  file  name  into  fmvptr's>; 

<send  a  message  of  type  Mil  (create  file  request) 
to  the  file_3et_manager  on  the  node  where 
the  file  is  to  be  copied>; 
endfor: 

and; 

Ml 2:  {  file  availability  info  from  f il e_set_manager  } 
begin 

<let  favptr  point  to  the  fav_rec  that  msg~ 
is  a  response  to>; 

<fill  in  the  availability  information  in  favptr~>; 
if  <responses  from  all  file_set_jnanagers 
have  been  receiVed>  then 

<send  a  message  of  type  M16  (file  availability 
information)  to  the  task_set_manager 
identified  by  a  field  of  favptr“>; 

endif : 

and; 

M13:  {  file  lock  and  release  results  from  f il e_set_manager  } 
begin 

<let  flrptr  point  to  the  flr_rec  that  msg“ 
is  a  response  to>; 

<fill  in  the  lock  and  release  results  in  flrptr~>; 
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if  <responses  from  all  file_set_jnanagers 

that  were  contacted  have  been  received>  then 
<send  a  message  of  type  M17  (results  of  file 

lock  and  release  request)  to  the  taak_set_jnanager 
identified  by  a  field  of  flrptr*>; 

sad  IX; 
end.; 

M14:  {  result  of  file  creation  request  from  f il e_set_manager  } 
begin 

{  This  message  is  part  of  a  series  of  messages 
used  to  copy  a  file  from  one  node  to  another. 

At  this  point,  file  processes  have  been  activated 
at  both  the  sending  and  receiving  nodes.  The 
next  step  is  to  send  a  signal  to  the  sending 
process  to  begin  transmission.  } 

<send  a  message  of  type  Ml 8  (signal  to  begin  copy) 
to  the  sending  file  process  in  the  copy 
operation^ 

end.; 

M15:  l  copy  completion  signal  from  a  file  process  } 
begin 

<let  fmvptr  point  to  the  fmv_rec  that  m3g“ 
is  a  response  to>; 

<record  in  fmvptr*  that  the  copy  operation 
indicated  in  msg*  has  been  completed>; 
if  <all  copy  operations  have  been  completed>  then 
<send  a  message  of  type  Ml 9  (results  of  file 
copy  request)  to  the  task_set_manager 
identified  by  a  field  of  fmvptr“>; 

endit; 

end.; 

end cage; 
endlfifln; 

end  file_system_jnanager; 


1.1.4  Processor  Utilization  Manager 


1:  process  processor  utilization  manager: 

2:  {  Every  node  possesses  one  of  these  processes.  This  process 

3:  records  the  latest  processor  utilization  information  received 

4:  from  each  node's  processor_utilization_jnonitor ;  it  provides 

5:  tasK_set_managers  with  this  information  on  demand;  and 

6:  if  it  does  not  hear  from  a  processor_utilizatiorvjnonitor 

7:  within  a  particular  interval  of  time,  it  records  the  processor 

8:  as  down  and  attempts  to  contact  that  processor_utilizatlorunonitor.  ) 

9: 

10:  var 

11:  msg:  message_pointer ; 

12:  pcutil:  array  [N0DES_0F_THE_.NET]  ££  pc_utilization; 

13: 
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begin 

loop 

<wait  for  a  message  of  any  type  (let  msg  point  to 
the  message) >; 
case  msg*.message_type  ££ 

M2:  {  pc  utilization  information  request  } 
begin 

<send  a  message  of  type  M20  (pc  utilization 
information)  to  the  task^set_manager  that 
sent  the  message  and  is  identified  in  msg“>; 

end.; 

M3:  {pc  utilization  information  from  monitor  } 
begin 

<record  information  in  msg“  in  pcutil  [msgA.node]>; 
<reset  deadman  timer  for  information  arriving 
from  node  msg'".node>; 

end.; 

M22:  {  deadman  timer  signal  -  this  indicates  that  a 

processor_utilizatioqjnonitor  has  not  reported 
within  the  required  time  } 
begin 

pcutil  t msg''. node]  :=  NOT_AVAILABLE ; 

<send  a  message  of  type  M23  ("are  you  alive?" 
query)  to  the  processor__utilizatior\_monitor 
on  node  msg<'.node>; 

£nd; 
endoase : 
endloop : 

end  processor_util  izatior\_manager ; 


1.1.5  frpceaaor  Utilization  Mgjqlt.gr. 


1:  process  processor_utilization_monitor ; 

2:  {  Every  node  possesses  one  of  these  processes.  This  process 

3:  records  various  performance  measurements  and  computes  a 

4:  processor  utilization  value  that  is  periodically  transmitted 

5:  to  all  processor_utilization_managers.  } 

6: 

7 :  begin 

8:  loop 

9 :  <gather  performance  measurements^ 

10:  <coopute  processor  utilization  value>; 

11:  for  <each  node  i>  do 

12:  <send  a  message  of  type  M21  (processor  utilization 

13:  information)  to  the  processor_utilizatioi\_manager 

14:  on  node  i>; 

15:  £Od£fiXi; 

16:  <sleep  until  it  is  time  to  gather  more  measurements>; 

17:  <wait  until  it  is  time  to  gather  more  measurements 

18:  or  a  message  from  a  processor_utilizatioii_manager 

19:  arrives>; 

20:  endloop : 

21:  end  processor_utilizatiorLjnonitor ; 
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1.1.6  Process  Manager 
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process  process_manager ; 

{  Every  node  possesses  one  of  these  processes.  This  process 
manages  the  processes  that  are  executing  on  its  node.  } 

var 

pcbptr :  process_control_block__pointer ; 
process_name_table :  process_name_to_pcbptr_jnap ; 
msg:  message_jpo inter ; 

begin 

Loss. 

<wait  for  the  arrival  of  a  message  (let  msg  point 
to  the  message) >; 
case  msg*.message_type  of. 

M6 :  {  process  activation  request  } 

.begin 

if  <process  type  is  an  object  file>  then 
new  (pcbptr); 

<record  process  identifying  information 
and  pcbptr  in  process_name_table>; 

<fill  in  the  necessary  information  in  pcbptr^; 
<initiate  the  loading  of  the  process>; 

0 

tasK_se t_aanager  ( CMNDFILE,  msg".file_descriptor) ; 
<record  process  identifying  information 
and  task_set_manager  identification  in 
pr o  ce  s  s_name_ta  bl e  > ; 

snsUX; 

ssS; 

M7:  {  process  kill  request  } 
begin 

<find  the  process  in  process_name_table>; 
if  <the  process  is  an  object  file>  then 
<terminate  the  process); 

<unload  the  process); 

<dispose  of  the  process  control  block); 

<send  a  message  of  type  M24  (process 

termination  message)  to  the  taslc_set_jmanager 
that  activated  the  process); 
else  {  the  process  is  a  command  file  } 

<send  a  message  of  type  M25  (request  to  terminate 
the  execution  of  a  command  file)  to  the 
tasK_set_jnanager  executing  this  command  file); 

end if: 

endcase : 
endloop : 

end  process_manager ; 
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1.1.7  File  Set  Manager 
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process  file_set_manager; 

{  Every  node  possesses  one  of  these  processes.  This  process 
manages  the  files  located  on  its  node.  } 

JL& L 

msg:  message_pointer ; 

file_directory :  file_locatioix_information; 

■bagia 

loop 

<wait  for  the  arrival  of  a  message  (let  msg  point 
to  the  message) >; 
case  msg~.message_type 

M8:  {  file  availability  request  } 
begin 

for  <each  file  named  in  msg*>  do 
<search  for  the  file>; 
if  <the  file  was  found>  then 
if  <the  file  is  free>  then 
<reserve  the  file>; 

Crecord  the  desired  access  to  the  file>; 
<note  that  the  file  is  avail able>; 
else 

if  <the  desired  access  to  the  file 
is  READ>  and  <the  access  already 
granted  to  the  file  is  READ>  then 
<note  that  the  file  is  avallable>; 
else 

<note  that  the  file  is  not  available>; 
endlf : 

£Qdl£; 

else 

<note  that  the  file  is  not  available>; 
endlf : 
eaflfac; 

<send  a  message  of  type  Ml 2  (file  availability 
information)  to  the  file_system_manager 
on  node  msg".node>; 

•sad.; 

M9:  {  file  lock  and  release  request  } 
begin 

for  <each  file  in  msg',>  do 
<search  for  the  file>; 
if  <the  file  was  found>  then 

<loek  or  release  the  file  as  requested>; 
else 

<note  that  the  request  could  not  be  satlsfied>; 
endif : 
siutffir; 
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<send  a  message  of  type  Ml 3  (results  of  file  lock 
and  release  request)  to  the  file_system_manager 
on  node  msgA.node>; 

M; 

M10 :  {  file  release  request  } 
begin 

for  <each  file  in  msg*>  dfl. 

<search  for  the  file  and  release  the  lock  on  it>; 
end for; 

M; 

Mil:  {  file  creation  request  } 
begin 

<create  an  entry  for  a  new  file  in  file_directory>; 
<activate  a  file  process  for  the  file>; 

<send  a  message  of  type  Ml 4  (results  of  file 
creation)  to  the  file_system_manager  on 
node  msg*.node>; 

M; 
endcase: 
endloop; 

end  file_set_manager ; 

1 .2  PSEUDO  CODE  FOR  THE  XFDPS.2  CONTROL  HQ&EL. 

1.2.1  System  Initiator 
Same  as  XFDPS.1. 

1.2.2  Task  Set  Manager 

XFDPS.1  with  the  following  changes: 

25:  <send  a  message  of  type  M2  (file  availability  request)  to 

26 :  the  f ile_system_manager  on  node  1  that  contains  the 

27:  names  of  files  needed  for  this  work  request>; 

44:  <send  a  message  of  type  M3  (file  lock  and  release  request) 

45:  to  the  file_system_pianager  on  node  1>; 

76:  <send  a  message  of  type  M4  (file  release  request) 

77:  to  the  file_systenL_manager  on  node  1>; 

1.2.3  File  System  Manager 
process  f il e_syst em_manager ; 

{  This  process  resides  on  node  1  and  satisfies  various  requests 
concerning  the  file  system.  This  process  maintains  the 
centralized  file  system  directory.  } 

var 

msg:  message_pointer ; 
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begin 

lOO-S 

<wait  for  a  message  of  any  type  (let  msg  point  to 
the  message) >; 
case  msgA.message_type  £f 

Ml :  {  file  availability  information  request  } 
begin 

for  <each  file  named  in  msgA?  da 
<search  for  the  file>; 
if  <the  file  was  found>  then 
for  <each  node  i>  da 

if  <the  file  is  free  on  node  i>  then 
<reserve  the  file>; 

<record  the  desired  access  to  the  file?; 
<note  that  the  file  is  available  on 
node  i>; 

else 

if  <the  desired  access  to  the  file 
is  READ>  and  <the  access  already 
granted  to  the  file  is  READ>  then 
<note  that  the  file  is  available  on 
node  i>; 

else 

<note  that  the  file  is  not  available 
on  node  i>; 

endif : 
endif : 
endfor; 
else 

<note  that  the  file  is  not  available  on 
any  node>; 

andAI; 
end  for: 

<send  a  message  of  type  Ml 2  (file  availability 
information)  to  the  tasK_set_jnanager  requesting 
the  information?; 

sod; 

M3:  {  file  lock  and  release  request  } 
begin 

for  <each  file  in  msgA>  da 
<3earch  for  the  file?; 

11  <the  file  was  found  and  is  present 
on  the  node  specified>  then 
<lock  or  release  the  file  as  requested?; 

slas 

<note  that  the  request  could  not  be  satisfied?; 

andi£; 
aodlsr; 

<send  a  message  of  type  Ml 3  (results  of  file  lock 
and  release  request)  to  the  tasK_set_jnanager 
that  made  the  request?; 
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M4:  {  file  release  request  } 
begin 

for  <each  file  in  msg*>  £2. 

<search  for  the  file  and  release  the  lock  on  i t > ; 
endfor: 

M; 

end case : 
endloop: 

end  file_system_nianager ; 

1.2.4  Process  Utilization  Manager 

Same  as  XFDPS.1. 

1.2.5  Processor  Utilization  Monitor 
Same  as  XFDPS.1. 

1.2.6  Process  Manager 

Same  as  aFDPS.I. 


1 .3  PSEUDO  CODE  FOR  THE  XFDPS.3  CONTROL  MODEL 

1.3.1  £ya£sn  Initiator 

Same  as  XFDPS. 1 . 

1.3.2  Task  Set  Manager 
Same  as  XFDPS. 1 . 

1.3.3  File  System  Manager 
XFDPS.1  with  the  following  changes: 


23 

24 

25 

26 
27 


<send  a  message  of  type  M8  (file  availability 

request)  to  the  file_set_manager  on  the  same  node 
as  this  file_systenLjnanager>; 


69:  JL£  <this  response  is  from  this  node>  and 

70:  <all  files  have  not  been  found  available>  then 

71:  for  <every  other  node  i>  da 

72:  <aend  a  message  of  type  M8  (file  availability 

73:  request)  to  the  file_set_manager  on  node  i>; 

74:  endfor: 

74a:  else 

74b:  IT  Responses  from  all  file_set_managers  have  been 

74c:  received  or  all  files  have  been  found  locally>  then 
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74d:  <send  a  message  of  type  M16  (file  availability 

74e:  information)  to  the  tasK_set_jnanager  identified 

74f:  by  a  field  of  favptr*>; 

1kg-.  eMil; 

Ikh:  endif : 

^.3.k  i Ecaaeag  Utilization  Manager 

Same  as  XFDPS.1. 

1.3.5  Processor  Utilization  Monitor 
Same  as  XFDPS.1. 

1.3.6  Process  Manager. 

Same  as  XFDPS.1. 

1.6.7  ills  £a£.  Manager 

Same  as  XFDPS.1. 

1 .4  PSEUDO  CODE  FOR  THE  XFDPS.4  CONTROL  MODEL 

1.4.1  sygtwn  Initiator 

Same  as  XFDPS.1. 

1.4.2  Task  Set  Manager 
Same  as  XFDPS. 1 . 

1.4.3  File  System  Manager 
process  file_system_manager ; 

{  Every  node  possesses  one  of  these  processes.  This  process 
satisfies  various  requests  concerning  the  file  system  and 
helps  maintain  the  redundant  copies  of  the  file  system 
directory.  } 

var 

msg:  message_pointer ; 

begin 

loop 

<wait  for  a  message  of  any  type  (let  msg  point  to 
the  message) >; 
fiaaa  ■sg“.message_type 

Ml  i  M3,  M4:  {  availability,  lock,  and  release  requests  } 
begin 

<place  the  message  on  the  queue  of  file  system 
requests  arriving  at  this  node>; 

auLi 
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CV:  {  control  vector  } 
begin 

while  <the  file  system  request  queue  is 
not  empty>  do 

<remove  a  message  from  the  queue  (let  msg  point 
to  the  message) >; 
case  msg'‘.message_type 

Ml:  {  file  availability  information  request  } 
begin 

for  <each  file  named  in  msg*>  da 
<search  for  the  file>; 
if  <the  file  was  found>  then 
for  <each  node  i>  da 

if  <the  file  is  free  on  node  i>  then 
<reserve  the  file>; 

<record  the  desired  access  to  the  file>; 
<note  that  the  file  is  available  on 
node  i>; 

else 

if  <the  desired  access  to  the  file 
is  READ>  and  <the  access  already 
granted  to  the  file  is  READ>  then 
<note  that  the  file  is  available  on 
node  i>; 

else 

<note  that  the  file  is  not  available 
on  node  i>; 

endif : 

sndif; 

endfor: 

else 

<note  that  the  file  is  not  available  on 
any  node>; 

audit; 

endfor: 

<send  a  message  of  type  Ml 2  (file  availability 
-  information)  to  the  tasK_set_manager  requesting 

the  information>; 

and; 

M3:  {  file  lock  and  release  request  } 
begin 

for  <each  file  in  msg~>  da 
< search  for  the  file>; 
if  <the  file  was  found  and  is  present 
on  the  node  specified>  then 
Clock  or  release  the  file  as  requested>; 
else 

Cnote  that  the  request  could  not  be  satlsfied>; 
endif: 
endfor: 
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<send  a  message  of  type  Ml 3  (results  of  file  lock 
and  release  request)  to  the  task_set_manager 
that  made  the  request>; 

M4:  {  file  release  request  } 
begin 

for  <each  file  in  msgA> 

<search  for  the  file  and  release  the  lock  on  it> 
endfor ; 

£M; 

endcase: 

endwhlle; 

<send  a  message  of  type  UPV  (update  vector)  to  the 
next  node  (according  to  the  predetermined 
ordering  of  nodes)  containing  the  changes  just 
made  to  the  file  system  directory>; 

£M; 

UPV:  {  update  vector  } 
begin 

if  <this  UPV  was  originated  by  this  node>  then 
<send  a  message  of  type  CV  (control  vector)  to 
the  next  node  (according  to  the  predetermined 
ordering  of  nodes) >; 

else 

<update  the  file  system  director y>; 

<send  the  message  of  type  T’PV  (update  vector) 

to  the  next  node  (according  to  the  predetermined 
ordering  of  nodes) >; 

endlf : 

eMsaas.; 

endlooo ; 

end  file_systent_manager ; 

1.4.4  ifrOfttea  PUUaatJLon  Manager 
Same  as  XFDPS.1. 

1*4.5  Processor  Utilization  Monitor 
Same  as  XFDPS.1. 

1.4.6  Process  Manager 
Same  as  XFDPS.1. 

1 .5  PSEUDO  CODE  FOR  HE  XFDPS.5  CONTROL  MODEL 
1.5.1  System  Initiator 


Same  as  XFDPS. 1 
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1.5.2  Task  Set  Manager 
Same  as  XFDPS.1. 

1.5.3  File  System  Manager 

Same  as  XFDPS.1. 

1.5.4  Process  Utilization  Manager 

Same  as  XFDPS.1. 

1.5.5  Processor  Utilization  Monitor 
Same  as  XFDPS. 1 . 

1.5.6  Process  Manager 
Same  as  XFDPS.1. 

1.5.7  File  Set  Manager 

XFDPS.1  with  the  following  changes: 

20:  <note  that  the  file  is  available>; 

21 : 

22: 


1 .6  PSEUDO  CODE  FOR  IH£  XFDPS. 6  CONTROL  MODEL 

1.6.1  System  Initiator 
Same  as  XFDPS. 1 . 

1 .6.2  Task  Set  Manager 

XFDPS.1  with  the  following  changes: 

75:  for  <each  task  in  the  message> 

76:  <mark  the  task  as  completed  in  tgA>; 

77:  end.fox; 

87:  for  <every  node  i  still  executing  parts  of  the  work 

88:  request>  d<2. 

89:  <send  a  message  of  type  M7  (process  kill  request) 

90:  to  the  process_manager  on  node  i>; 

91 :  sM£ac; 

92: 

93: 


1 
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1.6.3  File  Svatem  Manager 
Same  as  XFDPS.1. 

1.6.4  Process  Utilization  Manager 
Same  as  XFDPS.1. 

1.6.5  Erfi.QA8.g0r  Utilization  Monitor 

Same  as  XFDPS.1. 

1.6.6  Process  Manager 

process  process_manager ; 

{  Every  node  possesses  one  of  these  processes.  This  process 
manages  the  processes  that  are  executing  on  its  node.  } 

var 

pcbptr :  process_control_block_pointer ; 
process_name_table :  process_name_to_pebptr_map ; 
subtg:  task  graph  pointer; 
msg:  message_pointer ; 

begin 

loop 

<wait  for  the  arrival  of  a  message  (let  msg  point 
to  the  message) >; 
case  msg',>.message_type  a£ 

M6:  {  process  activation  request  ) 
begin 

new  (subtg); 

for  <each  task  i  im  msg">  dc 
<record  task  i  in  subtg“>; 
if  <task  i  names  an  object  file>  then 
new  (pcbptr); 

<record  process  identifying  information 
and  pcbptr  in  process_name_table>; 

<fill  in  the  necessary  information  in  pcbptr">; 
<initiate  the  loading  of  the  process>; 
else 

task_set_manager  (CMNDFILE,  msg".file_descriptor) ; 
<record  process  identifying  information 
and  tasK_set_manager  identification  in 
process_name_table>; 

endif ; 

Afldfgr; 

<link  subtg"  onto  the  list  of  subtaskgraphs  executing 
on  this  node>; 


aM; 
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M7:  {  process  kill  request  } 
begin 

<find  the  subtaskgraph  in  the  list  of 

subtaskgraphs  executing  on  this  node  (let 
subtg  point  to  the  subtaskgraph) >; 

for  <each  task  i  in  subtg*>  dn 

if  <task  i  has  not  completed>  then 

if  <task  i  names  an  object  file>  then 
<terminate  the  process>; 

<unload  the  process>; 

<dispose  of  the  process  control  block>; 

<mark  task  i  as  terminated>; 
else  {  the  process  is  a  command  file  } 

<send  a  message  of  type  M25  (request  to  terminate 
the  execution  of  a  command  file)  to  the 
task_set_manager  executing  this  command  flle>; 

endif : 

sndif; 

endfor: 

If  <all  the  tasks  in  subtg“  have  completed>  then 
<send  a  message  of  type  M2M  (subtaskgraph 

termination  message)  to  the  task_set_manager 
that  activated  the  subtaskgraph>; 

<remove  subtg“  from  the  list  of  subgraphs 
executing  on  this  node>; 
dispose  (subtg); 

endif; 

£Dl; 

endgjaag; 

sMIaaa; 

epd  process_jnanager ; 
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APPENDIX  2 
SIMULATION  RESULTS 

2.1  RESULTS  OF  GROUP  1  EXPERIMENTS 
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2.3  RESULTS  A  SINGLE  NODE  SIMULATION 

Average  Work  Request  Response  Time  for 
a  Single  Node  Network 


Average  Response  Time 
Run  (sec) 

1  44.6 

2  44.1 

3  43.7 

4  43.7 

5  44.2 


Mean:  44.1  seconds 
Standard  Deviation:  0.38 


